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Abstract 

Autoregressive-moving  average  (ARMA)  models,  and  their  autoregressive  (AR)  counter¬ 
parts,  are  useful  approxlmants  to  the  kinds  of  random  processes  commonly  encountered 
in  discrete-time  signal  processing  applications.  Such  models  may  be  used  to  compress 
data  in  low  bit-rate  information  transmission,  improve  frequency  resolution  in  spec- 
trias  analysis,  and  to  forecast  in  economic,  meteorological,  and  other  time  series. 

In  this  paper  ve  discuss  several  aspects  of  the  maximum  likelihood  theory  of  parameter 
identification  in  ARMA  and  AR  models.  Ue  highlight  the  initial  condition  problem 
encountered  when  identifying  ARMA  or  AR  models  from  finite  data  records  and  propose 
several  methods  for  computing  exact  and  approximate  likelihood.  Several  new  interpre¬ 
tations  are  given  for  the  innovation  representation  of  an  ARMA  process.  Computation¬ 
ally  efficient  lattice  and  fast  Kalman  filters  are  proposed  for  the  computation  of 
exact  likelihood. 


1.  INTRODUCTION 

The  random  processes  encountered  in  signal  pro¬ 
cessing  applications  are  typically  lowpass  or 
bandpass  processes  in  which  redundancy  is  high. 
This  means  finite-dimensional  models  may  often 
be  used  to  approximate  the  second-order  proper¬ 
ties  of  the  processes.  The  dominant  motivations 
for  using  finite-dimensional  models  are  (1) 
they  provide  a  systematic  framework  for  deriv¬ 
ing  data  compression  and  frequency  resolution 
Improving  algorithms,  and  (2)  they  become  pre¬ 
dictor  formulae  for  event  forecasting. 

The  problems  of  data  compression,  resolution 
improvement,  and  forecasting  are  "solved",  so  to 
speak,  by  identifying  a  parametric  model  that 
either  infinitely  extends  a  data  correlation 
sequence  or  matches  the  data,  itself,  in  a 
least  squares  or  maximum  likelihood  sense. 

The  catch  in  all  of  this  is  model  selection  and 
parameter  identification  within  the  model. 
Autoregressive  (AR)  models  are  by  far  the  sim¬ 
plest  parametric  models  to  Identify.  Exact 
maximum  likelihood  theory  leads  to  nonlinear 
methods,  even  in  the  AR  case.  However,  the 
best  AR  predictor  is  finite-memory  and  linear 
identification  procedures  Involving  Toeplitz 
normal  equations  or  non-Tocplitz  Yule-Walker 

1This  work  supported  by  the  Army  Research 
Office,  Research  Triangle  Park,  NO  27709, 
under  contract  DAAG29-79-C-0176  and  by  the 
Office  of  Naval  Research,  Statistics  and 
Probability  Branch,  Arlington,  VA  27740, 
under  contract  N00014-75-C-0518. 


equations  are  routinely  used. 

AR  models  suffer  the  defect  that  spectral  zeros 
are  not  easily  modeled  with  low-order  schemes. 
Couple  to  this  defect  the  fact  that  sample-data 
versions  of  rational  continuous-time  processes 
are  autoregressive  moving  average  (ARMA)  (11], 
and  we  have  strong  motivation  for  identifying 
the  more  general  ARMA  models. 

Traditionally  the  emphasis  in  Identification  of 
ARMA  models  has  been  on  approximate  representa¬ 
tions  (such  as  "long  ARs")  that  lead  to  linear 
identification  procedures.  However,  more 
recently  there  has  been  a  flurry  of  activity  in 
exact  maximum  likelihood  formulations  and  non¬ 
linear  optimization  procedures.  Representative 
recent  offerings  Include  papers  by  Akalke  (1], 
Newbold  (2),  Osborne  (3],  Harvey  and  Phillips 
[4],  Ansley  [5],  Pearlman  [6],  and  Jones  [7]. 
Akalke  (1)  has  advocated  the  use  of  a  Markovian 
representation  for  an  ARMA  process,  which  is 
essentially  the  standard  form  state  space  model 
well-known  to  engineers.  Jones  [7]  has  used 
this  representation  to  formulate  exact  likeli¬ 
hood  equations  for  ARMA  processes  observed  in 
white  noise.  He  advocates  state  space  models 
for  the  calculation  of  exact  likelihood  with  or 
without  missing  observations.  Newbold  (2)  gen¬ 
eralizes  the  exact  likelihood  results  of  Osborne 
for  MA  processes,  and  of  Box  and  Jenkins  |8) 
for  AR  and  MA  processes,  to  include  ARMA  pro¬ 
cesses.  Ilarvey  and  Phillips  (4)  advocate  Che 
Kalman  filter  as  a  recursive  technique  for  com¬ 
puting  exact  likelihood  and  reference  Schweppe 
(9]  as  perhaps  the  first  investigator  to  write 
exact  likelihood  in  terms  of  prediction  errors 


or  "innovations".  Pearlraan  (6)  discusses  the 
fast  algorithm  of  Morf,  Sidliu,  and  Kailath  (10] 
for  updating  the  Kalman  gain,  and  compares  it  to 
the  algorithm  of  Ansley  (5]. 

Why  exact  likelihood  and  maximum  likelihood 
theory?  Perhaps  the  most  convincing  argument 
in  favor  of  an  exact  maximum  likelihood  formal¬ 
ism  for  identifying  ARMA  models  is  that  it  gives 
one  a  basis  from  which  to  approximate.  We  re¬ 
turn  to  this  point  in  Section  5.  Add  to  this 
argument  the  success  of  authors  like  Jones  (7] 
with  maximum  likelihood  identification  of  low 
order  (e.g.  p  _<  3)  ARMA  models.  Whether  or  not 
exact  maximum  likelihood  becomes  a  standardized 
identification  procedure  will  depend  in  large 
part  on  our  ability  to  efficiently  compute  like¬ 
lihood  and  to  make  good  parameter  adjustments 
using  nonlinear  optimization  procedures. 

Paper  Outline:  In  this  paper  we  begin  with  a 
general  discussion  of  maximum  likelihood  (ML) 
theory  for  identifying  ARMA  models  of  normal 
time  series.  A  modal  decomposition  of  the 
correlation  matrix  for  ARMA  processes  is  derived 
and  placed  in  context  with  Anderson's  work  [12] 
on  correlation  matrix  identification  when  the 
matrix  is  a  linear  combination  of  known 
matrices.  The  Identification  equations  that 
result  from  ML  theory  are  matrix  versions  of 
the  Algraln-Wllliams  equations  [13]. 

We  next  derive  a  linear  time- Invariant  predictor 
formulation  of  the  likelihood  function  based  on 
a  standard  form  or  Markovian  state  space  repre¬ 
sentation  for  an  ARMA  time  series.  In  this 
formulation  the  likelihood  function  is  an  infi¬ 
nite-dimensional  average  over  a  noncountably 
infinite  collection  of  conditional  likelihood 
functions.  The  conditioning  is  on  an  initial 
condition  (or  state)  vector.  The  values  of 
this  representation  are  these:  (1)  special 
initial  condition  assumptions  suggest  themselves 
for  approximating  the  exact  likelihood  function 
(two  of  the  most  popular  techniques  associated 
with  the  covariance  method  of  linear  prediction 
are  easily  Interpreted  in  terms  of  initial  con¬ 
dition  assumptions)  and  (2)  an  interesting  ML 
procedure  for  ARMA  parameters  and  initial  con¬ 
ditions  arises  as  a  potentially  useful  method 
of  approximating  the  exact  likelihood. 

We  use  the  results  of  our  linear  time-invariant 
predictor  formulation,  together  with  variations 
on  the  Bayesian  tricks  used  in  Box  and  Jenkins 
[8]  and  Scharf  and  Nolte  [14]  to  derive  exact 
likelihood  for  ARMA  processes.  The  prediction 
residuals  from  a  linear  time-invariant  predictor 
are  used  in  conjunction  with  a  least  squares 
estimate  of  the  initial  state  based  on  the  ob¬ 
served  data  record.  The  results  include  those 
of  Newbold  [2],  Osborne  (3),  and  Box  and  Jenkins 
[8]  as  special  cases. 

The  next  Item  of  business  is  a  time-varying 
(innovations)  representation  of  the  likelihood 
function  in  which  the  Kalman  gain  arises  as  the 


principal  computational  problem.  We  show  how 
this  gain  may  be  obtained  by  trlangularlzing  an 
(NxN)  correlation  matrix.  This  provides  a  fast 
Kalman  predictor  of  the  Morf,  Sidliu,  Kailath 
(10)  variety.  From  the  perspective  of  filtering 
or  time  series  analysis  the  innovations  repre¬ 
sentation  of  an  ARMA  process  provides  a  zero- 
initial  condition,  time-varying  linear  filter 
representation  of  a  stationary  process.  From  a 
batch  data  processing  point  of  view  the  innova¬ 
tions  representation  provides  a  sequential  tri- 
angularizatlon  of  the  inverse  correlation  matrix. 
Finally,  viewed  from  a  probabilistic  perspec¬ 
tive,  the  innovations  representation  solves  the 
Chapman-Kolmogorov  equation  that  arises  in 
connection  with  our  averaged  linear  time-invari¬ 
ant  representation  for  the  llkallhood  function. 

The  innovations  representation  is  shown  to  be 
equivalent  to  a  representation  in  which  data  is 
generated  as  the  output  of  a  sequence  of  AR 
filters  of  order  1,2,...  .  This  leads  to  an 

AR  lattice  for  computing  exact  likelihood  for 
AF.'tA  models. 

An  efficient  fixed  point  algorithm  for  computing 
the  Kalman  gain  is  discussed  and  computational 
requirements  are  compared  with  those  of  Ansley 
[5]  and  Morf,  Sidhu,  and  Kailath  [10]. 

2.  MAXIMUM  LIKELIHOOD  THEORY  FOR  ARMA  PROCESSES 

Let  Y  *  (Vq. . . ,  y  )'  denote  a  finite  version  of 
a  wide-sense  stationary  Gaussian  sequence  (y  r. 
Assume  the  mean-value  sequence  is  identically 
zero.  Denote  the  correlation  sequence  by  ir  !. 
The  correlation  matrix  for  Y  is  the  non-nega?ive 
definite  Toeplitz  matrix 


The  likelihood  function  for  a  realization  of  Y 
is 

*  Cn(Y)  -  -  |log2  -  jIorIi^!-  jY'R^S’  (1) 

We  make  no  not..’  *nal  distinction  between  ran¬ 
dom  variables  and  realizations  of  them,  relying 
instead  on  context  to  make  the  meaning  clear. 

A  tvpical  ML  inference  problem  is  to  maximize 
the  likelihood  function  with  respect  to  the 
correlation  matrix  K  and  call  the  resulting 
"estimate"  the  ML  estimate  of  K.  The  result  is 

R_,  *  arg  max  •*(Y> 

-  I  ;  C  -  YY 

Thus,  without  prior  par amet  r izat  ion  of  fyto 


p 


reflect  the  fact  that  It  must  be  Toeplitz  ami 
related  to  ail  ARMA  sequence,  ML  theory  leads  to 
an  Inefficient,  Inconsistent,  non-Toeplitz  esti¬ 
mate  of  R.  The  corresponding  spectrum  estimate 
is  the  periodogram,  a  notoriously  bad  estimate. 


Modal  Decomposition:  The  correlation  sequence 
of  an  ARMA  (p,p)  process  (p  poles  and  p  zeros) 
■ay  be  written 


Z  A  P  1 

,  m  tn 
m-1 


We  will  call  { P 1 “ 1 }  the  mC  mode  of  the  process 
and  A  the  corresponding  residue.  Each  is  a 
complex  zero  of  the  polynomial  A(z)  -  1  4majZ_*+ 
...+a  z"P  in  the  ARMA  representation 


Wt-1+-  •  -+Vt-p’ Wt-1+-  •  -+bput-p 


u£  :  i.i.d  N(0,o  )  r.v.s. 


We  assume  the  zeros  of  A(z)  and  B(z)  »  1  4  b.z 
+. ..+b  z“P  lie  inside  the  unit  circle.  The 
resulting  process  is  stable  and  minimum  phase. 


(2) 

-1 


The  covariance  matrix  RN  may  now  be  written  as 
a  linear  combination  of  symmetric,  linearlv- 
independent,  Toeplitz  modal  forms: 
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An  alternative  representation  is  [12] 
P  N-l  „ 


*N  ■ 


I  A  2  p  F  - 
ro- 1  "  t-0  “  C 


N-l 
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rtFt 


These  representations  correct  a  basic  defect  in 
the  original  formulation  by  forcing  the  ML  estl- 
mate  of  R  to  be  symmetric  and  Toeplitz. 


A  r. 


*N  *  ,  mm 
m*  1 


whe  re 


(A  ,G  »  arg  max  -  ^log2n-  iloglRi 

m  m  1  (A  ,p  |P  2  1  ^ 

m  ml 
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The  resulting  nonlinear  ML  equations  are 


for  poles  o^itr  R^*  ( n  ^R^C-D^J-0  (m-1 ,2 , . . .  ,p) 


for  residues  A 


m  b  m  N  m  (m,1>2 . 


P) 


constraints:!*-  TAG  G  *{p*- 
^  ,  n  m  m  m 

m-1 


- 1 i-j  i 
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Here  is  how  these  equations  may  be  used.  For 
any  choice  of  residue-pole  pairs  (An.c,,,)^  ob¬ 
jective  functions  of  the  form  p(Am,o  )  - 
tr  Rj|1[CnR^1C-Gm] ,  m-1. 2,.  ^p,  may  be  formed. 
These  functions  have  zeros  at  tAm, •>„,!,  the  ML 
estimates.  So  ML  estimates  may  be  found  from 
a  nonlinear  regression  algorithm  that  seeks 
the  zeros  of  '-(A^.c,,,),  m-1 ,2 , . . .  ,p. 


Linear  Time  Invariant  (Markovian)  Representa¬ 
tion  :  Let  iv  :  be  the  ARMA  (p,p)  sequence  of 
Eq.  (2).  TheCstandard  form  or  Markovian  state 
space  representation  for  (y  >  is 
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In  this  representation  it  is  important  to  note 
that  thb  p-dimcnsional  state  x^  is  a  vector  of 
s-step  predictors  y  <t_j(s-0, 1  •  •  •  •  •  P'D  based 
on  the  Infinite  past  i Vt_j <y t_2 • ■ • • ' : 


:'t4s/t-i  * 


The  unit  pulse  response  sequence  (li^l  and  corre¬ 
lation  sequence  t  r£  >  provide  invaluable  first- 
and  second-order  descriptors  for  the  ARMA  pro- 
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Figure  1  shows  feedback  diagrams  for  this  pro¬ 
cess  model  and  the  corresponding  predictor 
structure.  Note  that  both  diagrams  are  time- 
invariant . 
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Fig.  1.  Markovian  State  Space  Model  for  ARMA 
(p*p)  Process:  (a)  model, 

(b)  predictor 

Linear  Time- Varying  ( tnnov.it  ions)  Representa¬ 
tion:  The  time-varying,  or  innovat ions, repre¬ 

sentation  for  (y  Is 


In  this  representation  the  p- dimensional  state 
xt  is  a  vector  of  s-step  predictors  yt+s/t-i 
(s*0, 1 , . . . ,p-l)  based  on. the  finite  past 

<yt-i*yt-2 . yo): 


-Ely.,  /y. 


.ynl 


The  time-varying  unit  pulse  response  sequence 
(hj  and  the  (generally  non-stationarv)  corre¬ 
lation  sequence  |r*  )  provide  first-  and 

second-order  descriptors  for  the  ARMA  process: 
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With  k.tvt  chosen  to  be 
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*  r^  and  iy  'j?  is  wide-sense  stationary 
same  correlation  function  obtained  in 
ious  section.  Figure  2  shows  feedback 
for  this  model  and  the  corresponding 
r  structure.  Note  that  both  diagrams 
-varying.  The  innovations  model  is 
timo-varving  model  that  starts  from 
tial  conditions  and, nevertheless ,  gener- 
t at  ionary  sequence. 


(b) 


Fig.  2.  Innovations  Representation  for  ARMA 
(p,p)  Process:  (a)  model, 

(b)  predictor 

Exact  Likelihood  in  the  Markovian  Representa¬ 
tion:  From  the  predictor  structure  of  Figure  1 
we  note  that,  given  x..,  Y  is  distributed  as 
follows: 


f(Y/V  *  t;0  Syt(\/t-l'°  ) 

Here  N  is  the  normal  densltv 
yt 

Nyt(*t/t-r°2)  * 


exp(-  -ij  ( y 


Caution:  v  .  is  a  function  of  x^!  But  xQ 
is  distributed  as  follows:  " 


x0  :  NXo(O.Q0)-(2,)-P/::ko;-,J. 


exp(-  '  x0'!0 1  x0 ' 


The  exact  likelihood  is  l^(Y)  "  in  f(Y).  In 
Section  5  we  show  how  this  representation  may 
be  used  to  obtain  a  variety  of  approximate  like¬ 
lihood  functions. 

Exact  Likelihood  in  the  Innovations  Representa¬ 
tion  :  From  the  predictor  structure  of  Figure  2 
we  see  that  Y  is  distributed  as  follows: 


f  (Y) 


t-0 


Vyt/t-i-v 


(8) 


where  y^(  ^now  depends  only  on  the  finite  past 

and  v  is  a  time  varying  prediction  error  vari¬ 
ance.  c  The  exact  likelihood  is  (Y)  »  Jn  f(Y). 

Connections:  A  Chapman  Kolmogorov  Equation  and 
Initial  Conditions:  Comparing  Eqs.  (7)  and  (8) 
it  is  clear  that  the  innovations  representation 
has  solved  a  very  important  Chapman-Kolmogorov 
equation: 


/dx  N  (O.Q  >  s 
U  *0  U  t- 


y(yt/t-i(v-j  > 


N 

TT 

t«0 


VVt-i'V 


(9) 


(We  have  used  the  notation  Yt/t-l^o^  on  the  LHS 

to  distinguish  between  the  two  yt/t_i-)  This  is 
one  interpretation.  But  note  f(Y)  may  be  writ¬ 
ten 


f(Y)  -  f(Y/x0)f(x0)/f(x0/Y) 

What  this  means  is  that  solution  of  the  Chapman- 
Kolmogorov  equation  is  tied  up  in  the  solu¬ 
tion  for  f(xn/Y),  the  raster:.  :'i  density  for 
the  initial  condition  x.,  given  observations  Y. 
This  is  fundamental.  Always  the  problem  is 
initial  conditions.  In  Section  5  we  derive  a 
closed  form  expression  for  f(x0/Y),  and  thus 
derive  an  exact  time-invariant  realization  of 
the  likelihood  function. 


The  Innovations  Representation  is  Equivalent  to 
a  Sequence  of  AR(t)  Models  of  Increasing  Orders: 
The  innovations  representation  may  be  inter¬ 
preted  as  a  sequence  of  AR(t)  filters  in  dis¬ 
guise.  This  interpretation  provides  for 
another  constructive  approach  to  recursive  com¬ 
putation  of  exact  likelihood.  The  key  is  to 
use  the  properties  of  the  Toeplitz  correlation 
matrix  R^  to  achieve  a  sequential  computation 
in  terms  of  dimension. 


So  the  unconditional  densltv  of  Y  is 


Recall  the  original  likelihood  expression 


Partition  to  obtain  an  equivalent  partition 
of  the  Inverse, 


r0 
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where  .rj  represent  t lie  coefficients  of  the  opci 
roal  AR  prediction  of  order  j  and  «j  the  corre¬ 
sponding  variance.  The  sum  »i  squares  in  the 
likelihood  is  then  obtained  •  the  squares  of 
the  successive  outputs  of  a  . ime  varying,  order 
increasing  MA  filler  fed  with  the  data  to  gen¬ 
erate  the  prediction  residuals: 
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(13) 


With  °N  '  r0  -  r  RN-l  r  "  IRn!/IRN-iI‘  71,6 

vector  is  easily  recognized  as  the  set  of 
optimal  parameters  for  an  autoregressive  model 
AR(N)  of  order  N  fitted  on  the  given  model 
(AR>1A)  correlation  R^.  As  a  consequence  the 
likelihood  is  decomposed  according  to  (11): 

.2 

jn  *  Vi  -  ilog  -  i  5  -  Ylog  2"  <U) 

N 

Here  is  the  prediction  error  of  the  AR(N') 
model  measured  on  the  data  and  the  corre¬ 
sponding  variance. 

This  approach  is  obviously  equivalent  to  the 
preceding  innovation  representation  and  is 
merely  another  way  of  computing  the  innovations 
sequence.  The  whitening  condition  is  here  re¬ 
placed  by  the  orthogonality  condition  satisfied 
by  the  optimal  AR  estimated  model.  In  the  fol¬ 
lowing,  one  of  the  two  approaches  will  more 
naturally  give  rise  to  a  time  sequential  com¬ 
putation  of  likelihood  and  the  other  to  a 
batch  Implementation. 

3.  RECURSIVE  COMPUTATION  OF  LIKELIHOOD  FOR 
BATCH  DATA 


This  computation  is  started  with  zero  initial 
conditions  and  conveniently  weighted  by  the 
inverse  variance  t  . 

A  seemingly  more  convenient  implementation, 
first  proposed  by  Kailath  et  al.  (15),  makes 
use  of  a  now  classical  expression  of  the  in¬ 
verse  of  a  Toepiitz  matrice  [16]  in  terms  of 
the  coefficients  of  the  order  N  predictor  (i.e. 
first  column  of  the  triangular  factorization  in 
(12)), 
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where  a.  *  a 


N 


This  representation  is  suitable  for  time  invar¬ 
iant  Implementation  using  two  reciprocal  MA 
filters  of  given  order  N  and  N-l: 


In  many  applications  the  measurements  Vg...y 
are  readily  available,  and  can  be  processed  as 
a  batch.  Moreover,  the  likelihood  computation 
for  a  given  model  is  generally  imbedded  in  a 
larger  iterative  procedure  for  optimizing  the 
model  (such  as  nonlinear  programing)  which  is 
not  fitted  for  real-time  implementation.  An 
efficient  batch  method  for  evaluating  the  like¬ 
lihood  is  therefore  often  desirable. 

A  first  solution  Is  given  by  the  expression  for 
RjJ1  using  the  autoregressive  estimates  of  suc¬ 
cessive  orders.  The  appropriate  Choleskv  fac¬ 
torization  is 
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(15) 


The  outputs  are  then  squared,  subtracted, 
weighted  and  accumulated  to  compute  the  desired 
likelihood.  The  major  limitation  of  this  meth¬ 
od  is  evidently  the  restriction  to  a  fi:;ed 
sample  size  N. 

But  the  opposition  between  the  two  implementa¬ 
tions  of  Eqs.  (13)  and  (13)  is  only  apparent. 
The  time-variation  in  (13)  is  more  meaningfully 
Interpreted  as  an  order  adjustment  at  each  sam¬ 
ple.  The  two  methods  can  be  imbedded  in  the 
same  lattice  structure  using  only  the  reflection 
coefficients  ' ,  *  aj  (i.e.  last  row  of  the 
triangular  factorization  in  (12))[17).  The  pre¬ 
ceding  quantities  z^  are  then  available  at  the 
output  of  the  various  sections  of  the  lattice 
according  to  Figure  3. 


Fig.  3.  Use  of  an  AR  lattice  for  computing 
exact  likelihood  for  ARMA  process 

This  structure  is  well  known  for  its  excellent 
sensitivity  properties  and  allows  at  the  same 
time  a  convenient  sequentialization  of  the  com¬ 
putation  by  adding  new  sections  to  the  fixed 
preceding  ones. 


The  computational  requirement  for  the  above 
method  is  the  fitting  of  an  order  N  AR  predictor 
to  the  correlation  sequence  of  the  given  ARMA 
model  under  test.  This  is  conveniently  achieved 
using  the  classical  Levinson  algorithm  in  order 
of  operations  instead  of  N-^.  But  still  N  is 
large  in  many  applications  (e.g.  -  2S6)  and  the 
correlation  must  be  carefully  computed  (even  in 
this  noise  free  case)  to  ensure  the  necessary 
stability  of  che  predictions. 


RECURSIVE  COMPUTATION  OF  LIKELIHOOD  FOR 
REAL-TIME  DATA 


The  state  space  formulation  gives  a  nice  con¬ 
ceptual  way  of  generating  the  innovation  se¬ 
quence  in  the  well  understood  framework  of 
Kalman  filtering.  The  formulation  becomes  par¬ 
ticularly  interesting  when  we  note  that  there 
exist  efficient  algorithms  for  computing  the 
Kalman  gains.  This  efficiency  is  not  achieved 
using  standard  algorithms  that  ignore  the  fine 
structure  of  the  data.  Here  the  data  are  out¬ 
put  correlation  coefficients  for  an  ARMA  pro¬ 
cess,  and  the  state  space  equations  describing 
them  must  have  an  input/output  counterpart. 

More  precisely  the  convolutional  form  for  the 
time-varying  innovation  representation  is 


i“1  hi  Vi  +  "o 


which  relates  to  the  state  space  description  by 


the  Inverse  as  in  (12)  the  correl.it  ion  matrix 
Itself  mav  he  uniquely  decomposed  into  .in  upper 
and  a  lower  (transposed)  factor. 
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(16) 


Here  the  notation  "2"  stands  for  the  product  b 
its  transpose.  The  quantities  Sj  are  readily 
interpreted  as  cross  correlations  between  the  y 
and  their  prediction  error  for  a  predictor  of 
order  j.  As  the  r^  for  i>q  satisfy  the  AR  re¬ 
cursion  with  coefficients  a^,  it  is  also  the 
case  for  the  s^  i>q.  After  the  normalisation  b 
1/sJ,  the  s|  are  the  coefficients  of  a  time 
varying  impulse  response.  By  taking  the  first 
(p+1)  rows  and  columns  of  Rt,  it  follows  that: 
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As  a  consequence  the  time-varying  system  with 
impulse  response  h^  fed  with  white  noise  of  var¬ 
iance  v^  both  defined  such  that 

h^  -  s? / Sq  and  v£  -  s^  (18) 

reproduces  the  stationary  correlation  of  the 
process.  The  h?  for  i=l,p  coincide  with  the 
components  of  the  Kalman  gain  kt  and  vt  is  the 
corresponding  variance  of  the  innovations. 

An  efficient  algorithm  has  been  provided  for 
factoring  Rt  according  to  (16)  |18J.  Moreover, 
it  has  been  proven  to  generate  a  minimum  phase 
model  (19)  and  can  be  implemented  using  fixed 
point  arithmetic  (20): 
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This  impulse  response  description  will  now  be 
used. 


In  this  particular  case,  only  the  s^  for  i=Ofp 
are  useful.  They  are  computed  using  p  nonzero 
values  for  i<0,  so  that  the  required  number  of 
multiplications  per  step  is  approximately  2p. 

The  values  for  i 'p  may  be  deduced  if  necessary 
from  the  recursion  with  coefficients  a.. 


Instead  of  using  the  Cholesky  factorization  of 


Moreover  it  mav  be  noted  that  the  most  important 


role  In  the  behavior  of  the  model  Is  played  by 
the  MA  part.  For  an  AR-only  process,  the  Kalman 
gain  would  converge  In  p  steps  (p  first  AR  pre¬ 
dictors)  but  for  an  MA  process  the  gain  converges 
at  infinity.  It  seems  therefore  interesting  to 
consider  the  AR  and  MA  parts  separately  instead 
of  constructing  the  ARMA  correlation  sequence. 
This  can  be  done  by  extending  the  AR  part  of  the 
equation  in  order  to  include  the  zero  initial 
conditions  and  deduce  the  time  varying  MA  corre¬ 
lation  matrix  V  for  the  given  stationary  ARMA 
output  correlation  (^according  to  (20)  : 


iao  VaP 


v-  ap!  I 
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a  ...  a. 

p  0 


The  (pxp)  Toeplitz  matrix  of  unit  pulse 
responses  satisfies  c lie  linear  equation 


Special  cases  are 

MA:  H-6  AR  :  A  H  * 

P  P 

We  mav  summarize  init'il  conditions  by  estab¬ 
lishing  the  followin'  near  dependence: 


A*  V  t  A  V  ■ 

-p  0 

The  special  cases  an 


U  +  B  Un 

-p  0 


y  -  *  6*  r  +6 

0  -p 


A‘  Y-p  +  A  -0  ■  l'o 


Due  to  the  zero  initial  conditions,  the  V  matrix 
is  no  longer  Toeplitz  but  is  still  a  banded 
matrix  and  can  be  computed  using  a^  and  b^.  A 
generalization  of  Bauer's  Algorithm  [21]  may  be 
applied  in  this  time-varying  case  [22]  to  factor 
V  and  retrieve  the  Kalman  gains.  The  algorithm 
then  requires  approximately  q~/2  multiplies  per 
step  and  can  be  Interesting  for  small  q.  A 
closer  examination  of  V,  shows  that  it  is 
Toeplitz  except  for  a  pxp  right  bottom  matrix 
and  it  is  the  feeling  of  these  authors  that  the 
computations  could  be  reduced  using  other 
choices  for  the  initial  AR  and  MA  parts. 


5.  INITIAL  CONDITIONS 


We  return  to  the  ARMA  difference  equation: 

yt  +  alyt-l  +'"+  apyt-p  ’  ut  +  blVl  + 

. . .+  b  u 

p  t-p 

Let  us  establish  the  following  data  conventions: 


From  the  time-invariarit"state  space  representa¬ 
tion,  we  have  initial  conditions: 


Yo  -  HP  uo 


The  special  cases  are 


Q.  -  R  -HH'  (23) 

0  p  p  p 


vo  *  *0  +  6  uo  Yo  ■  xo  +  A_1  uo 

xo  1  fi*  %  xo  =  -Hp  A*  Y-p 

5.1  Approximate  Likelihood:  Fixed  Initial 
Conditions : 

Recall  the  likelihood  function  of  Eq.  (7).  Wit 
the  normal  densitv  N  (0,Qo)  replaced  by  the 
x0 

dirac  density  5(xn-x0),  we_have  the  approximate 
likelihood: 


fO7x0)  -  ^  ) 


y  -  .  Y„«  D  *  rn- 

-p  0  -p  0 


Question:  How  to  choose  Xq? 

Method  1:  With  xQ  ■=  Y.  we  are  assuming 
U  0  in  (23).  This  implies  QQ  *  R  .  The 
approximate  likelihood  is  simply  obtained  by 
summing  squared  residuals  in  Figure  1,  with 
Xq  fixed  at  Yq.  The  special  cases  are 


III 


M  I.Y 


tn  each  of  these  special  cases.  Initial  condi¬ 
tions  are  manufactured  outside  the  data  Interval 
with  a  backward  pi.'dictor.  In  the  AR  case  Y_p 

is  a  backward  pre  .ction  from  Yg.  The  resulting 

approximate  likel .hood  function  reproduces  the 
equations  of  the  covariance  method  of  linear 
predict  Ion- 

Method  2.  With  x  =  0  we  have  Yg  «  HUg. 
This  Implies  :  0.  Again  the  approximate 
likelihood  is  obtained  by  sunning  residuals  in 
Figure  1  with  Xg  fixed  at  0.  The  special  cases 
are 


So  f(Y,xQ)  is 


f(Y,;0)-(2.oV(ll+1)/2.,p(-  (LY-MXq)  *(ly-m*0)  j  (2/,) 

2o 

The  corresponding  likelihood  is  i(Y,x_)  * 
inf (Y,Xq) . 

The  AR  case  specializes  nicely: 
xn  -  Y  -  (H  H')R_1  Y. 

0  0  p  p  p  0 


Equivalently, 


Rp1  Yo  *  Qo1  *0 


In  this  method  initial  conditions  are  set  to 
zero  outside  the  data  interval.  In  the  AR  case 
the  resulting  equations  correspond  to  the  pre- 
wlndowed  method  of  linear  prediction. 

5.2  Approximate  Likelihood:  MAP  Estimation  of 
Initial  Conditions: 


A*Y  +  H-1  Q_  R*1  Y  -  0 

-p  p  p  0 

Note  this  reduces  to  the  covariance  method  of 
linear  prediction  under  the  approximation 
Q^R-^  -  l.  The  corresponding  likelihood  is 


As  a  second  alternative  to  approximating  exact 
likelihood,  consider  maximizing  the  Joint  den¬ 
sity  of  the  data  Y  and  the  Initial  conditions 
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max  f (Y ,xq) 


Write  out  the  time- invariant  (Markovian)  state- 
space  equation  for  t-0,l,...,N: 

Y  -  0xQ  + 

Y'  -  [y0...yN]  U'-  (vV 

O'  -  [c  ,  a'c  ,...,  (a')nc! 


The  second  term  on  the  RHS  represents  a  correc¬ 
tion  to  the  usual  covariance  method  of  linear 
p.  “diction. 

5.3  Exact  Likelihood: 

Write  the  joint  density  of  the  data  Y  and  the 
initial  conditions  Xg  as 

f(Y,x0)  -  f(Y/x0)f(xQ)  =  f(x0/Y)f(Y)  - 


We  have  the  following  linear  dependence 


U  -  L  Y  -  Mx0  , 


f(xQ)  :  prior  density  on  Xp  :  (0,Qg) 


L  -  H^1  and  M  •  H^1  0 

The  matrix  0  is  the  observability  matrix. 

The  joint  density  of  Y  and  Xg  is  obtained  from 
the  density  of  U: 

f(Y,xQ)  *  (2-o')~(N+1^2exp{ - ^  g(Y,xQ)) 

2o 

g(Y,xQ)  -  ( LY- MXq )”( LY-Mx0 ) 

Note  g('.,Xg)  may  be  written 

r(Y,x0)=r(Y,x0)+(x0-x0)M  M(x0-x0) 


f(Y/Xp)  :  conditional  density  : 


tI0  Nyt(>’t/t-l(V'0  > 


f(Xp/Y)  :  a  /.vs tjf‘:  >••  density  for  Xg, 
given  data  Y 

f(Y)  :  unconditional  density  of  Y 


witli 


Return  to  the  Joint  density  f(Y,Xg): 

f(Y.xQ)  -  (2no2)‘(N+2)/2exp(-  gCY.ig)  - 

2  o 
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2  o 


It  follows  fron  a  factorization  theorem  that 


f(xQ/Y)  -  (2xo2)‘p/2|M'M|l1 


exp( - j  (V*o>  'M'M(x0-x0>> 


Now  write 


f(Y)  -  f(Y/x0)f(xQ)/f(x0/Y) 


likelihood  have  been  analyzed  and  compared.  As 
In  previous  contributions  to  this  problem,  a 
central  role  Is  played  by  the  Innovation  of  the 
ARMA  process.  Other  derivations  of  exact  like¬ 
lihood  have  proceeded  within  the  framework  of 
Kalman  filtering  where  the  Innovations  play  an 
essential  part.  But  Innovations  need  not  be 
tied  to  Kalman  filtering.  In  this  paper  we  have 
emphasized  the  Innovations  representation  for 
the  ARMA  process  itself.  This  leads  to  a 
straightforward  computation  of  Kalman  gains  de¬ 
signed  to  generate  a  stationary  correlation 
sequence  in  spite  of  the  zero-initial  condi¬ 
tions  imposed  on  the  model.  The  state-space 
consists  of  one-step  through  p-step  predictions 
based  on  the  finite  past.  An  interesting  in¬ 
terpretation  is  that  the  time-varying  Kalman 
gains  comprise  the  time-vary  ing  >1A  part  of  an 
ARMA  model  with  fixed  AR  part. 


l(Y)  -  i (Y / xQ)  +  t(xQ>  -  !(xQ/Y> 
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The  first  summation  may  be  obtained  from  the 
time- invariant  predictor  in  Figure  1,  starting 
from  any  Xq.  But  the  choice  must  be  fixed  up 
with  the  remaining  terms. _  Several  choices  are 
of  interest  :  x„  -  xQ  ,  x  ■  0. 


6.  CONCLUSIONS 


Maximum  likelihood  (ML)  is  an  attractive  para¬ 
metric  method  for  fitting  models  to  observed 
data.  When  the  data  is  a  time  series  record 
and  the  underlying  model  is  ARMA,  then  ML  be¬ 
comes  a  parametric  method  for  identifying  linear 
models.  In  the  parlance  of  signal  and  system 
theory,  ML  becomes  a  method  of  identifying 
rational  discrete-time  systems  from  output  data 
only. 


In  general  ML  leads  to  nonlinear  equations  in 
the  parameters  to  be  identified.  This  is  re¬ 
flected  in  all  of  our  expressions  for  exact 
likelihood.  By  making  special  assumptions  about 
initial  conditions  one  can  obtain  in  the  AR  case 
quadratic  approximations  to  exact  ML  that  lead 
to  such  methods  as  the  pre-  windowed  and  covari¬ 
ance  methods  of  linear  prediction. 


The  innovations  of  a  stationary  ARMA  process 
need  not  be  interpreted  in  the  context  of  state 
space  models.  They  may  also  be  interpreted  as 
a  sequence  of  independent  random  variables  ob¬ 
tained  by  sequentially  whitening  the  correlated 
data.  The  whitening  transformation  . .  derived 
by  trlangularlzing  an  Inverse  covariance 
matrix.  The  triangularlzaticn  procedure  is 
equivalent  to  fitting  a  sequence  of  AR  models 
to  ARMA  data.  Stable  and  efficient  algorithms, 
which  compete  favorably  with  fast  Kalman  algo¬ 
rithms,  are  available. 

The  problem  of  computing  exact  likelihood  may 
also  be  cast  in  such  a  way  that  initial  condi¬ 
tions  on  a  time-invariant  predictor  play  an 
important  role.  The  initial  conditions  may  be 
pinned  at  an  arbitrary  value  provided  subse¬ 
quent  corrections  are  applied.  This  formula¬ 
tion  shows  the  way  to  approximations  of  exact 
likelihood  chat  may  be  superior  to  existing 
ones.  The  same  approach  could  have  been  ap¬ 
plied  to  final  conditions,  providing  a  com¬ 
pletely  symmetrical  setting  in  which  to  discuss 
forward  and  backward  prediction  outside  the 
data  window. 

As  mentioned  previously,  this  paper  is  focused 
on  one  aspect  of  ML  identification  of  ARMA 
models:  computation  of  exact  likelihood.  It 
is  our  feeling  that  some  progress  is  still  to 
be  made  on  this  problem  to  reduce  complexity. 
But  just  as  important  for  overall 
maximization  of  likelihood  would  be  efficient 
procedures  for  evaluating  gradients  and 
Hessians  in  the  framework  of  tlme-var.ing  inno¬ 
vations  representations  or  tine-invariant  pre¬ 
dictor  representations. 


If  no  approximations  are  to  be  made  then  two 
problems  arise:  (1)  efficient  computation  of 
exact  likelihood  for  each  set  of  candi¬ 

date  ARMA  parameters,  and  (2)  evaluation  of 
gradients  and/or  Hessians  for  cfliclent  itera¬ 
tion  to  a  solution. 

In  this  paper  we  have  focused  or.  problem  (1). 
Several  approaches  to  the  computation  of  exact 
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Abstract 


In  this  paper  we  present  a  general  fraaevork  for 
deriving  and  Interpreting  analysis  and  synthesis 
spectra  of  the  autoregressive  (AR)  and  aovlag  average 
(MA)  type.  By  analyzing  MA  and  AR  linear  transforma¬ 
tions  of  finlte-dlaenslonal  data  records  we  derive  MA 
type  spectra  that  are  direct  analogs  of  the  AR  type 
spectra  associated  with  the  maximum  likelihood  method 
(MLM)  and  the  maximum  entropy  method  (MEM)  of  spectrua 
analysis.  Asymptotically  the  MA  theory  is  tied  up  with 
Hold's  decomposition  in  the  same  way  the  AR  theory  is 
tied  up  with  Kolmogorov's  whitening  theory. 

By  parametrising  the  MA  and  AR  type  spectra  we 
obtain  a  variety  of  spectrua  models  that  trade  off 
resolution  and  power  fidelity.  He  propose  J-divergencc 
as  an  attractive  order  fitting  rule  and  show  how  it 
relates  to  final  prediction  error  (FPE)  and  Akalke'a 
Information  criterion  (AIC) . 

1.0  Introduction 


The  concept  of  wlde-sense  statlonarlty  seems  to 
underly  the  very  notion  of  a  spectrum.  It  is  a  mistake 
however,  to  conclude  that  one  should  identify  only 
stationary  models  when  estimating  spectra.  The  problem 
with  stationary  models  is  that  initial  conditions  mani¬ 
fest  themselves  as  nuisance  parameters  chat  must  either 
be  estimated  or  averaged  over.  In  nonstationary  models 
the  initial  conditions  manifest  themselves  as  time 
variations  (as  in  innovations  representations)  or  as 
order-increases  (as  in  lattice  representations) .  In 
either  case  the  initial  conditions  may  be  absorbed 
naturally  into  the  theory. 

In  this  paper  we  present  a  general  framework  for 
deriving  and  interpreting  analysis  and  synthesis 
spectra  of  the  autoregressive  (AR)  and  moving  average 
(MA)  type.  By  analyzing  MA  and  AR  linear  transforma¬ 
tions  of  finlte-dlaenslonal  data  records  we  derive  MA 
type  spectra  that  are  direct  analogs  of  the  AR  type 
spectra  associated  vlth  the  maximum  likelihood  aechod 
(MLM)  and  the  maximum  entropy  method  (MEM)  of  spectrua 
analysis.  Asymptotically  the  MA  theory  is  tied  up  with 
Hold's  decomposition  in  the  same  way  the  AR  theory  is 
tied  up  with  Kolmogorov's  whitening  theory. 

By  paraaetrlzlng  the  MA  and  AR  type  spectre  we 
obtain  a  variety  of  spectrua  models  that  trade  off 
resolution  and  power  fidelity.  We  propose  J-dlvergence 
as  an  attractive  order  fitting  rule  and  show  how  It 
relates  to  final  prediction  error  (F?E)  and  Akalke'a 
information  criterion  (AIC) . 

For  an  autoregressive  moving  average  (ARMA)  time 
series,  a  fast  iapulse  response  algorithm  may  be  used 
In  conjunction  with  an  innovations  representation  or  a 
synthesis  lattice  to  efficiently  realise  an  MA  type 
synthesis  of  ARMA  data.  Conversely  a  fast  Levinson/ 

This  work  supported  in  part  by  the  Office  of  Naval 
Research,  Statistics  and  Probability  Branch,  Arlington, 
VA,  under  Contract  N00014-75-C-0518  and  the  Army  Re¬ 
search  Office,  Research  Triangle  Park,  NC,  under  Con¬ 
tract  DAAG29-79-C-0176. 


Durbin  algoritha  may  be  used  1a  conjunction  with  a 
Kalman  predictor  or  an  analysis  lattice  to  efficiently 
realize  an  AR  type  analysis  of  ARMA  data.  In  the  con¬ 
cluding  section  we  outline  how  these  observations  lead 
directly  to  a  derivation  of  fast  and  exact  likelihood 
for  ARMA  time  scries. 

2.0  Linear  Filters  and  Representation 
of  8tationarv  Sequences 

Let  (y^)  denote  a  real,  zero-mean,  wlde-sense 
stationary  sequence  with  real  correlation  sequence 
(rt>.  This  sequence  hac  factorization 


He  call  (h&)  the  impulse  response  sequence,  for  reasons 
to  become  clear. 

Whenever  we  speak  of  (yc)  we  have  in  mind  the 
triple  ((yt),(ht>,{rt)). 

2.1  MA(-)  or  Wold  Decomposition 

The  Wold  decomposition  for  the  sequence  { y t }  is 
the  Infinite  moving  average  (MA(«)) 


y  •  I  h  u„ 

1  n-0  "  *“■* 

u,  :  white  sequence  (zero-mean,  unit  variance 
r.v.s.) 

The  coefficients  (h  }  are  called  the  MA(«)  filter  co- 
n 

efficients. 

First-Order  Descriptors.  Replace  (u  )  by  the  iapulse 
sequence  {6^}  to  obtain  the  impulse  response 


t  >  0 


ht  ■ 


The  complex  frequency  response  associated  with  this 
Iapulse  response  is1 

h(w)  -  11m  hH  C  (to) 

tt-  1 

+jw  +j(t-l)w 


Ct(w)  •  (l.e  ,...,e 
n 

h  ■  (hjj.hj . ht_j) 


) 


Superscript  B  denotes  Hermltian  transpose. 


2.6.1 


Second-Order  Descriptors.  The  correlation  sequence  la 


r  «  Z  h  h  .  i _  I 
t  „.0  n  n+|t| 


B(x)  -  Z  h  a 
n-0  a 


and  tha  whitening  filter 


The  corresponding  power  apectral  density  (or  magnitude- 
squared  frequency  response)  la 

r(«)  -  |h(w) 

MA(q)  Case.  In  the  MA(q)  case  the  moving  average  rep¬ 
resentation  terminates  and  we  may  write 


A(z)  -  Z  a 
n-0  n 

The  results  of  Sections  2.1  and  2.2  say  that  the  Impulse 
responses  of  H(x)  and  A(t)  annihilate  each  other  or, 
equivalently,  that  H(s)  and  A(s)  whiten  each  other: 


q 

y  “  Z  h  u 
C  n-0  ”  e-n 

The  resulting  specializations  In  the  first-  and  second- 
order  descriptors  are  obvious. 

2 . 2  AR(-)  or  Kolmogorov  Representation 


The  Komogorov  represent »t ion  for  {y£}  is  the 
inf inite aetoregresslon  (AR(-)): 


I  a  h  -  4 

.  n  t-n  t 
n-0 


A(z)H(z)  -  1 


3.0  Linear  Transformations  and  the  Representation 
of  Snapshots 

Let  Y  denote  a  t -sample  snapshot  of  (y£)  : 

Tt  "  (yo  yi  •"  yt-i)H 

The  correlation  matrix  for  Y(  la  symmetric  and  Toeplitz: 


:  •„  "  u,  i  *  0 

n  t-n  t  □ 

n-0 

The  coefficients  (a  }  are  called  the  AR(-)  filter  co- 
n 

efficients. 

First-Order  Descriptors.  Replace  (u^)  by  the  impulse 
sequence  {5^}  to  obtain  the  impulse  response 


Z  a  h  -  5. 
n-0  "  t*n  e 


The  complex  frequency  response  is 
a(u)h(u)  -  1 

a(u>)  •  11m  cj)  £ 

tt-  C 


a  -  (aQa1  ...  a^) 

Second-Order  Descriptors.  The  correlation  sequence  is 
characterized  by 


Z  a  I  r  a  -  5 
n  m  e-n+s  a  s 

m-0  n-0 


L  t-i  ‘o  j 

This  matrix  has  LO  Cholcsky  decomposition 

R  «  H  H8 
t  t  t 


<  r~ 

k!  "5  0 

0 

hS 

°  ! 

htii  hti 

Cl. 

from  which  it  follows  that  the  spectrum  is 
|a(w)|2  r(w)  •  1 

AR( p)  Case.  In  the  AR(p)  case  the  autoregression  ter¬ 
minates  and  we  may  write 

P 

Z  a  v  •  u 
„  n  't-n  t 
n-0 

Obvious  specializations  of  all  first-  and  second-order 
descriptors  result. 


We  call  the  s  row  vector  h*  the  order-s  HA  synthe¬ 
sizer,  for  reasons  to  become  clear,  and  the  st"  column 
vector  h  the  impulse  response  of  an  MA  linear  trnas- 
formatloift  to  an  impulse  applied  at  time  s.  When  we  tie 
up  MA  and  AR  theory  we  will  see  that  other  interpreta¬ 
tions  accrue,  as  well. 

The  matrix  R-*  has  UL  Cholesky  decomposition 


h3  Ty.^a  'Jr  the  ?!A(")  and  AR(-)  Representations 
Define  the  synthesis  filter 


2.6.2 


i 

i 

a 


I 


N 


£ 


We  call  che  sC  row  vector  a*  Che  order-a  KA  whltener, 
for  reaaona  co  become  clear,  and  che  sth  column  vector 
a  che  exciter  of  an  AR  linear  transformation  required 


Ht(u;L)  •  (hg(u) , . . . ,ht_^(u)) 

h#<m)  -  cj^m)^ 

RFR.  The  right  frequency  response  is  the  object 
Rt(«;R)  -  Ht  Ct(u) 

This  Is  e  column  vector  of  phased  complex  frequency 
responses  for  the  KA  synthesizers  h* : 

He(u.;R>  -  (h°(w) . ht-1(u>))H 

h%)  -  h*  Ct(u) 

Second-Order  Descriptors.  R  is  our  obvious  second- 
order  descriptor.  By  analogy  with  linear  filtering 
theory  we  can  try  to  associate  spectra  with  the  norms 
of  our  complex  frequency  responses. 


co  generate  an  impulsive  (not  Impulse)  response.  When 
we  tie  up  MA  and  AR  theory  we  will  find  other  inter¬ 
pretations  and  connections. 

Whenever  we  speak  of  Y^  we  have  in  mind  Che  triple 

Ort*Ht>Rt*  or>  •Ruiv*l«ntlyt  the  triple  (Yt,At,RtS. 


LFR.  Define  the  left  spectrum  as 
RtU;L)  -  |Ht(u;L)!2 


R 


Ct(u) 


3.1  KA  or  Synthesis  Transfoemat ion 

The  snapshot  has  MA  cype  representation 


C-1  , 

-  t  i h  (ui  r 

s-0  S 


Ht  Ut 


:  white  vector  (EUt-0  : (txt)  identity) 


The  output  y  may  be  written 


This  result  leads  to  several  important  observations: 

(1)  t-1  times  the  left  spectrum  is  the  triangu¬ 

lar  windowed  or  Bartlett  spectrum: 


s 

Z 

n-0 


For  this  reason  we  call  hs  the  order-s  MA  synthesizer. 
In  che  limit  st«  we  have 


j  c"(o))Rt  Ct(u») 


t-1 

Z  (1- 

n— (t-1) 


lim  hS  -  h  V  n»0 

,t-  n 

where  h  Is  the  nth  MA(-)  filter  coefficient  in  the 
n 

Wold  decomposition. 


(2)  the  Bartlett  spectrum  Is  an  average  of 
magnitude-squared  frequency  responses 
for  the  impulse  responses  h  : 


j  c“(u))RtCt(u.) 


1  t-1  2 

7  *  |hs(-)l2  . 

s-0 


First-Order  Descriptors.  Replace  by  the  identity 
I(  to  obtain 


Yt  *  Ht 

As  1^  corresponds  to  a  column  sequence  of  delayed 
impulses,  we  call  a  column  sequence  of  impulse  re¬ 
sponses  to  impulses  applied  as  time  s.  The  entry 

h**11  in  J»s  is  the  response  at  time  s+n  to  an  impulse 
applied  at  time  s.  It  is  easy  co  see  chat 

.  n  .0 

h  -  r  /h. 
n  n  0 

There  are  at  least  two  definitions  we  can  give 
for  the  "frequency  response"  of  the  HA  transformation 
H  . 
t 


LFR.  Tha  laft  frequency  response  la  the  object 

Ht(u;L)  -  c"(w)«t 

This  is  a  row  vector  of  phased  (or  delayed)  complex 
frequency  responses  for  the  impulse  responses  h  : 


(3)  if  the  average  above  is  (arbitrarily?) 
truncated  at  s-0,  something  akin  to  the 
rectangular  windowed  spectrum  results: 


,  t-1  . 

;2  •  :  h  e‘jsu' 

3-0  S 


i  t-1  ,  i 

1  r  -jnu)  •  2 

- 0T  '  1  rn  e 

:h0  n-0 

RJR .  Define  the  right  spectrum  to  be 


Rt(w;R)  •  |H  (w:R)|" 

t-1  - 

-  z  ihs(u)r 


This  leads  to  the  following  observations: 

(1)  t”1  times  the  right  spectrum  is  an  average 

of  magnitude-squared  frequency  responses 
for  the  KA  synthesisers  h*: 


2.6.3 


7»>.M  *7  I  |h*(»)|2  , 

'  c  c  s-0 

(2)  If  the  average  above  is  (arbitrarily)  modl- 
f lad  to  Include  only  the  t-1  tera,  the 
following  maximum  order  KA  synthealzer 
spectrum  results: 


|hC'1(u.)|2 


|V  h1-1  e'^S“|2 

s-0  * 


MA(q)  Casa:  In  the  MA(q)  case  tha  MA  synthesizing 
transforaatlon  specializes  as  follows: 


h2  hi 


The  important  thing  to  notice  here  is  that  for  q<s±t-q 

(t>^2q+l) ,  tha  MA  synthesizer  h®  looks  Ilka 
the  impulse  response  h^. 

LFR.  In  the  MA(q)  case  the  LFR  specializes  as 
does  Its  magnitude  squared.  The  magnitude  squared 
becomes 

q-i  2  2 

R,  „(<u;L)  -  1  |h  (w)  |  +(t-2q)  |h  (u>)  | 

t,q  s-0  ®  q 

t-1  , 

+  i  |h  («)T  . 

s-t-l-q  * 

a  linear  combination  that  lends  extra  weight  to  the 
qtn  impulse  response. 

RFR.  In  the  MA(q)  case  the  RFR  specializes,  as 
does  it  magnitude-squared.  The  magnitude-squared  be¬ 
comes 

R„  <w;R)  •  V  |h*(w)  |2  (t-q)  | hq(w)  | 2  . 

C,q  s-0 

a  linear  combination  that  lends  extra  weight  to  the 
qth  MA  synthesizer. 

3.2  AR  or  Analysis  Transforaatlon 

The  snapshot  has  AR  type  representation 

At  Tt  ’  Ut 

The  outputs  y(  to  y^  may  be  filtered  to  obtain 


Tor  this  reason  we  call  a  the  order-e  MA  vhltener  or 
analyzer.  In  the  limit  7+—  we  have 

11m  a*  -  a  M  n>0 
at-  "  " 

where  a  la  the  nCt<  AR(-)  filter  coefficient  in  the 
n 

Kolmogorov  representation. 

First-Order  Descriptors.  Replace  Ut  by  It  to  obtain 
At  Bt  "  *t 

As  In  the  MA  case,  Is  the  coluam  sequence  of  impulse 
responses  .  The  entries  in  h^  have  the  same  inter¬ 
pretation  as  before. 

If  U£  is  replaced  by  an  exciter  aatrlx  that  makes 
Y^-I^ ,  an  impulsive  response,  we  obtain 


For  this  reason  we  call  A£  also  a  column  sequence  of 
Impulsive  exciters. 

LFR.  The  left  frequency  response  Is  the  object 
At(w;L)  -  CB(w)  At 

This  is  a  row  vector  of  phased  complex  frequency  re¬ 
sponses  for  the  impulsive  exciters  a^: 

Ag(<ii;L)  »  (a0(oi),...,at_1(w)) 

.,(«)  -  C^(u)  a^ 

RFTL_  The  right  frequency  response  Is 
At(w;R)  -  At  Ct(w) 

This  la  a  column  vector  of  phased  complex  frequency 
responses  for  the  MA  whitsners  a*: 


At(ui;R)  -  (a“(cu)  .  .  .  a*  1(w)) 
a* (w)  -  a*  Ct(w) 

Second-Order  Descriptors.  R(  (or  R~2)  is  our  obvious 

second-order  descriptor.  We  can  try  to  associate  spec¬ 
tra  with  the  norms  of  the  complex  frequency  responses. 
But,  as  A£  Is  a  whitening  transforaatlon,  we  ought  to 

use  the  Inverse  of  these  spectra  as  our  spectral  defi¬ 
nitions  for  T(. 

LFR.  Define  the  left  whitening  or  anslysls  spec¬ 
trum  as 


Rt(w;L) 


|At(w;L)| 


CB(u.)AtABCtU) 


i  •_(«)! 


Several  observations  can  be  made: 


I  a®  y  -  u 

n-0  n  — “  ® 


2.6.4 


(1)  The  Inverse  of  t  times  the  left  whitening 
spectrua  Is  an  average  of  order-increasing 


* 

v; 

exciter  spectra: 

w 

1  .i^laWl* 

*’ ; 

t**U;U  1  s-0 

(2)  If  the  sun  is  truncated  at 

bCi 

spectrum  Is  Jhe  inverse  of 

L 

trua  |a_(u) |  . 

ant. 


Define  the  right  whitening  spectrum  at 
1 _ 


Rt(w;L)  - 


I  At <«}*>] 


^•1  2 

£  |e*(w)r 

a-0 

Several  inter  eating  obaervatlone  nay  be  made: 

(1)  The  right  whitening  apectrua  la  the  aexloua 
likelihood  method  (MLM)  apectrua, 

(2)  The  inverse  of  t  tiaae  the  right  whitening 
apectrua  la  an  average  of  order-increasing 
whitening  apectra: 

1  1  ‘Z1  i  a,  N ,2 

- -  —  £  |a  («)  I 

tR~(ui;R)  1  s-0 


(3) 


if  the  aua  is  arbitrarily  modified  to 
include  only  the  t-l  term,  the  maximum 
entropy  method  (MEM)  spectrum  reaults: 


lat'1(u)!2 


AR(p)  Case.  In  the  AR(p)  case  the  A*  synthesizing 
transformation  specializes  as  follows: 


At,p’ 


For  p^b£t-p  (t_2p+l)  the  MA  whltener  a*  looks  like 
the  transpose  of  the  exciter  a ^ ■ 

LFR.  In  the  AR(p)  case  the  LFR  epeclallzea  as 
does  its  magnitude-squared.  The  left  whitening  spec¬ 
trum  becomes 


Rt.p(“!L)  ‘ 


t-i 


£  |a  (u) |2+(t-2q) |a  (w) |2+  £ 

S-0  *  P 


l»  («> 

s-t-l-q 


a  linear  combination  that  lends  extra  weight  to  the  p 
exciter  apectrua. 

1TR.  In  the  AR(p)  case  the  rlfht  whitening  spec- 
trua  specializes  to  the  following  linear  combination: 


th 


*  («*;*)  "  — 3; - * 

£  |a*(u)  |2+(t-p)  |ap(ui)  |2 

a-0 

This  lends  extra  wight  to  the  pth  MA  whltener. 

3.3  Ty<nB  up  the  HA  and  AR  Theory 
We  have  the  relations 

At  Ht  '  Xt 
Ht  At  “  X* 

Proa  these  the  most  succinct  interpretations  of  Ht  and 
A£  are  the  following: 


Ht  ‘ 


.t-l 


;  h*: order-s  MA 
~  synthesizer 


So 


Si 


S2 


;  h  :  Impulse  re- 
'sfonse  for  AR(s) 
filter  correspond¬ 
ing  to  order-s  MA 
whltener  a* 


A£  -I 


t-y 


whltener  or 
analyzer 


*1 


i2 


St-ll 


a  : impulse  re¬ 
sponse  for  AR(s) 
filter  correspond¬ 
ing  to  order-s  MA 
synthesizer  h® 


2.6.5 


4.0  Parametrized  Spectra 

Th«  results  for  MA(q)  and  Aft(p)  linear  transfor¬ 
mation*  lugjiit  that  the  MA  lynthaal*  and  tha  At  anal¬ 
ysis  transformations  H  and  A  nay  ba  approximated  by 

til  til  *  * 

tha  q  and  p  ordar  approxlaants  and  A.  . 

i|P 

This  kind  of  thinking  In  tha  linear  transformation 
world  Is  dlractly  analogous  to  ths  kind  of  thinking 
that  goes  on  In  tha  linear  filtering  world  whan  we 
Identify  order-q  HA  and  order-p  AS  filters  to  nodal 
data  that  Is  surely  inf  ini  te— dimensional.  So  think  of 

H  and  A  as  parametrized  synthesis  and  analysis 
t  »q  - »P 

linear  transformations  that  can  be  used  In  place  of 
H£  and  Ag  In  the  various  spectra  defined  in  Sections 

3.1  and  3.2  to  obtain  parametrized  analysis  and  syn¬ 
thesis  spectra. 

In  the  sections  to  follow  we  explore  the  so-called 
parametrized  maximum  likelihood  method  (pMLM)  spectrum 
that  results  from  replacing  A  by  A  in  the  right 
whitening  spectrum: 


t  t,p  t 


PI1|a*(u.)|2+(t-f)|ap(u)|2 

*-o 


R_1  «  AH  A 

t.p  t,p  t,p 


Let’s  agree  to  call  R~  (w£),  the  pMLM  spectrum 

t.P 

MLM„  <w) . 
t.P 

4.1  Limiting  Results 

The  pMLM  speccrum  may  be  written 


MLM  (w)  MLM  ,(u)  MEM  (u) 
t.P  p-1  P 

where  MLM  , (w)  is  the  maximum  likelihood  method  spec- 

p— i 

trua  of  order  p-1  and  MOl^Coi)  la  the  maximum  entropy 
method  spectrum  of  order  p. 

p-t-1.  In  this  case  the  pMLM  spectrum  Is  ths 
MLKt_1  spectrum. 

p  fixed;  tt«,  In  this  case  the  pMLM  spectrins 
goes  to  the  MEM^  spectrum: 

limtMLM  (u)  -  MEM  (u) 


From  these  results  we  see  that  the  pMLM  spectrum 
defines  a  two-parameter  class  whose  properties  lie,  in 
some  sense,  between  those  of  the  MLM  and  the  MEM  spectra. 

4.2  numerical  Results 

We  examine  the  performance  of  tha  pMLM  spectna 
for  three  different  sets  of  correlation  data.  For 
purposes  of  comparison  with  published  results,  we  use 
the  example  of  two  close  sine  waves  in  noise  studied 
by  Lacoss 


resolve  quite  successfully  the  two  close  peaks  whi¬ 
ttle  KLMj^u)  cannot.  Another  interesting  obsarvatUe 

concerns  the  power  estimation.  The  parametrised 
W^i.iiM  Mtlmates  accurately  the  3dB  different.  u 

the  peaks,  while  the  MZMj^Cu)  spectrum  gives  on  the 
order  of  3dB. 

Tha  second  example  shown  la  Flgurs  2  l*  based 
synthetic  AR  data.  We  generate  4th  order  u  dets* 
corresponding  to  the  transfer  function 

B(z)  - - r 


The  same  comments  can  be  made  -  the  parametrized 

“*21,  5(w)  yielda  a  such  better  resolution  than  the 
MLMj(w) . 

Finally  in  Figure  3  we  have  applied  the  netted  ts 
a  finite  set  of  recorded  speech  data.  The  data  used  u 
a  set  of  80  data  points  saapled  from  the  vowel  "I"  a* 
the  sentence,  HI  hope  it's  April".  This  example  u;.,. 
tratea  very  well  the  compromise  between  resolution  t&t 
smoothness.  The  three  spectra  Illustrated  require 
exactly  the  same  amount  of  computation.  However,  ths 
parametrized  MLM^  ^(tu)  exhibits  a  dramatic  improve¬ 
ment  In  resolution  over  the  MLMn(ui)  end  a  such  better 
power  estimation  of  tha  second  peak  over  the  MEM  ■’«! . 

The  results  are  summarized  in  the  following  way. 
The  pMLM  spectrum  can  be  used  in  two  conceptually 
different  ways.  First  If  one  Is  interested  in  saving 
parameters,  then  a  resolution  comparable  to  the  ML<t(-i 
can  be  achieved  with  the  pMLM£  with  p<t.  t 

On  the  other  hand  if  better  resolution  is  sought 

{r£)0°  1s  available,  then  by  extending  the  correlate 
matrix  to  t>tQ,  a  resolution  close  to  the  MEM(tQ)  css 

be  obtained  while  preserving  the  power  estimation  of 
the  MLM.  As  a  result  we  see  that  the  parametrized  XL* 
realizes  a  compromise  between  the  smoothness  of  the  XL* 
spectrum  and  the  resolution  of  the  MEM. 

4,3  Order  Fittlnx 

The  essential  problem  In  all  of  this  Is  order- 
fitting:  deciding  when  Rc  p,  for  example,  is  clot* 

enough  to  R„  to  decide  that  H  or  A  ls  a  good 
t  t.p  t.p 

enough  approximation  to  R£  or  A£.  In  the  context  of 

the  parametrized  maximum  likelihood  method  (pMLM)  two 
order  fitting  rules  arise  naturally. 

J-Dlvergence:  Suppose  we  are  given  a  finite  data  »*t 

T  with  correlation  matrix  R„ .  The  parametric 

t  1 

model  Is  determined  by  the  correlation  matrix  R£  ?  s< 

defined  earlier.  The  problem  of  order  fitting  is  one 
of  discriminating  between  the  two  hypotheses  H(  and 

H„  from  the  data  Y  : 
t.P  t 

Ht  :  Y  :  H(0,R£) 

"...  ‘  • 

Tha  divergence  between  the  two  hypotheses  is  defined 


r,«4t  ♦  5.33  cos(.3nt)  +  10.66  cos(.4»t)  tc[0,21]. 

In  Figure  1  It  is  observed  that  by  extending  the  corre¬ 
lation  matrix  to  obtain  the  MLMj^  j^(u)  •P^ttrum  wa 


Jt(p,t)  -  W  ^  L£(y  ) 

'  t.P  t 

where  L£(x)  Is  the  likelihood  function  L£(x)  * 
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K*/H  )/P<»/«t>*  The  set  of  4ict  1«  auppoood  norm- 
«liy  distributed.  Using  tho  identity  E(xTQx)  - 
Tr(QE(xxT)).  tho  J-divergance  to 


J  ,<*•*> 


Tho  J-divergance,  while  not  o  true  metric,  provides  e 
eeesure  of  tho  distance  between  the  two  hypotheses.  It 
eon  also  he  viewed  as  a  measure  of  how  fax  01 

I  i'1  aie  froe  the  identity  matrix.  If  X.,i«0,l,..., 

t(p  {  ♦ 

t-i.  are  the  eigenvalues  of  the  matrix  *e*tip.  th“  *• 
caa  also  vTlte 


Jt<P.t) 


This  suggests  that  the  J-dlvergance  is  more  simply 
Interpreted  as  a  measure  of  the  dispersion  of  the 
eigenvalues  around  unit. 

J-dlvergm>ce  is  twice  the  Kullbech-Leibler  Informa¬ 
tion  used  by  Akalke  in  his  discussion  of  information 
(dter  is  and  order-fitting.  Thus  Aka  Ike's  derivation 
of  the  AIC  can  be  translated  in  its  integrity  in  terms 
of  the  J-dlvergence,  and  the  following  order  fitting 
rule  is  proposed: 

Jt  (p)  •  V*,P)  + 

«  Simpler  Distance  Measure:  The  calculation  involved 
la  the  J-dlvergence  is  quite  large.  We  ere  after  a 
staple  order  fitting  rule  easy  to  compute  at  each  order 
p.  An  intuitive  measure  of  the  distance  between  IT'1 

-1  '  * 

and  kg  p  consists  in  getting  a  measure  of  hov  far 

—  ,  p<i<t-l,  are  from  the  constant  value  —  .  Consider 

I,  01 

1  p 

the  following  quadratic  mean: 


DME(p) 


1 

t-p-1 


t-1 

I 

i-p-1 


lote  that  If  the  given  correlation  sequence  corresponds 
to  an  exact  autoregressive  process  of  order  p^,  than 

3H(p)»Q  for  P^Pq. 

Numerical  Results:  In  Figure  6  we  apply  the  cri¬ 
teria  to  the  set  of  correlation  data  estimated  from 
synthetic  AR(4)  data  generated  in  Example  n*2  of  Sec- 
tioc  1.2.  The  simple  measure  and  the  J-dlvergence 
ve  compare  to  the  minimum  prediction  error.  Figure  5 
illustrates  the  similarity  of  behavior  between  the 
'-divergence  corrected  for  asymptotic  bias  and  the  AIC 
criterion.  We  have  applied  the  criteria  to  the 
speech  data  in  Figure  6.  Again  the  J-dlvergence  cor- 
'•****  behaves  like  the  AIC  but  has  a  sharper  and 
clearer  aL-imum.  Here  the  simple  measure  has  a  slow 
cnvtrgenct  rate  that  would  be  misleading  in  the  order 
fitting. 


,5.0  Conclusions 

Many  of  the  ideas  la  this  paper  are  speculative. 
y»e  spectra  defined  sea  to  be  plausible  definitions 
•  or  the  frequency  responses  of  nonatationary  synthe¬ 
sisers  and  vhltenera  (analysers) .  The  fact  that  the 
lo°  leads  to  Bartlett  and  MLM  spectra  lands 
****  £I*dm>ce  to  the  line  of  reasoning.  Tbs  numerical 
'Mules  indicate  that  pMLM  -  and  hopefully  its  other 
Mrsmet rixad  counterparts  as  well  -  can  become  useful 
ef#  Ct*  l°  *n<1  HEM  processing.  The  pMLM  seems  to 
"  •  tr*deoff  between  the  resolution  of  MEM  and  the 


power  fidelity  of  MLM. 

All  discussions  of  MA(q)  and  AK(p)  linear 
transformation  generalises  to  ABMA(p,q)  linear  trans¬ 
formation.  Such  transformations  underly  lattice  and 
Innovations  representations  of  stationary  time  series 
and  the  corresponding  lattice  and  Kalman  predictor 
structures  that  play  such  an  Important  role  in  exact 
likelihood  for  ABKA  time  series.  The  nonstatlonarlty 
(or  Initial  conditions)  of  the  linear  transformations 
ere  captured  In  the  order  increase  for  the  lattice  and 
the  time  variation  for  the  Kalman  gain.  By  combining 
faat  Levlnson/Durbln  and  Impulse  response  algorithms, 
the  lattice  and  Kalman  predictors  lead  to  fast  and 
enact  computation  of  exact  likelihood. 

Koopmaos[l]  contains  a  more  complete  discussion 
of  the  MA(“)  and  AR(>)  theory  of  stationary  time  ser¬ 
ies.  The  Interpretations  of  *t  and  R"1  ere  evident  in 

the  Influential  work  of  Fried lander .  Kellath,  end  Morf. 
The  connect ion  abet ween  MLM  end  MEM  are  due  to  Burg  [2]. 
The  Interpretation  we  give  for  the  Bartlett  spectrum 
may  be  close  to  that  given  by  Oppenheln  at  the  I960 
l'Aquila  conference.  The  Lacoss  example  is  reported 
In  [3].  The  appropriate  references  to  Akalke  are  [4] 
and  [5].  Other  topical  work  on  order-fitting  is  re¬ 
ported  by  Fart an  [6]  and  Klasanen  [7].  Our  motivation 
to  use  J-dlvergence  arose  from  (8]  and  [9].  The  sim¬ 
plified  EKE  rule  is  discussed  more  fully  in  (10]. 

Exact  likelihood  is  reported  in  [10]-[12].  Variations 
on  and  faat  versions  of  the  Levlnaon-Durbln  algorithm 

(13]-[14]  are  reported  In  [15],  [16],  and  [10],  where 
respectively  the  MSK  algorithm,  the  Impulse  response 
algorithm,  and  a  fast  Impulse  response  algorithm  ere 
derived. 
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Abstract 

Les  model es  autoregressifs  a  moyenne  mobile  (ARMA)  sont  des  approxi¬ 
mations  utiles  des  processus  aleatoires  communement  rencontrds  dans  le 
traitement  des  signaux  a  temps  discrets.  De  tels  model es  sont  utilises  a 
la  compression  des  donnees  en  transmission  d' informations  a  bas  debit,  a 
1 'amel ioration  de  la  resolution  en  analyse  spectrale,  a  la  prevision  en 
economie,  meteorologie  et  autres  series  numeriques. 

Dans  cet  article  nous  discutons  differents  aspects  de  1 1  identifica¬ 
tion  des  mcdeles  ARMA  par  le  maximum  de  vraissemblance.  Nous  soulignons 
le  role  de  la  representation  de  "1 'innovation"  dans  le^lcul  de  la  fonc- 
tion  de  vraissemblance  exacte.  Finalement  nous  montrons*  comment  la  struc¬ 
ture  interne  du  model e  peut  etre  mise  a  profit  pour  accelerer  le  calcul  de 
la  fonction  de  vraissemblance  soft  par  un  predicteur  rapide  de  Kalman  soit 
par  1 'implementation  d'une  structure  en  Schell e  (Lattice)  rapide. 

Autoregressive-moving  average  (ARMA)  models,  are  useful  approximants 
to  the  kinds  of  random  processes  commonly  encountered  in  discrete-time 
signal  processing  applications.  Such  models  may  be  used  to  compress  data 
in  low  bit-rate  information  transmission,  improve  frequency  resolution  in 
spectrum  analysis,  and  to  forecast  in  economic,  meteorological,  and  other 
time  series. 

In  this  paper  we  discuss  several  aspects  of  the  maximum  likelihood 
theory  of  parameter  identification  in  ARMA  models.  We  highlight  the  role 
of  innovations  representations  In  exact  likelihood  theory  and  show  how 
internal  model  structure  may  be  used  to  speed  up  calculation  of  likelihood 
in  either  fast  Kalman  predictor  or  fast  lattice  implementations. 


This  work  supported  by  the  Office  of  Naval  Research,  Statistics  and 
Probability  Branch,  Arlington,  VA  27740,  under  contract  N00014-75-C-0518, 
and  by  the  Army  Research  Office,  Research  Triangle  Park,  NC  27709,  under 
contract  DAAG29-79-C-0176. 
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1 .  INTRODUCTION 

The  random  processes  encountered  in  signal  processing  applications  are 
typically  lowpass  or  bandpass  processes  in  which  redundancy  is  high.  This 
means  finite-dimensional  models  may  often  be  used  to  approximate  the 
second-order  properties  of  the  processes.  The  dominant  motivations  for 
using  finite-dimensional  models  are  (1)  they  provide  a  systematic  frame¬ 
work  for  deriving  data  compression  and  frequency  resolution  improving 
algorithms,  and  (2)  they  become  predictor  formulae  for  event  forecasting. 

The  problems  of  data  compression,  resolution  improvement,  and  forecasting 
are  "solved",  so  to  speak,  by  identifying  a  parametric  model  that  either 
infinitely  extends  a  data  correlation  sequence  or  matches  the  data,  itself, 
in  a  least  squares  or  maximum  likelihood  sense. 

AR  models  suffer  the  defect  that  spectral  zeros  are  not  easily  modeled 
with  low-order  schemes.  Couple  to  this  defect  the  fact  that  sample-data 
versions  of  rational  continuous-time  processes  are  autoregressive  moving 
average  (ARMA),  and  we  have  strong  motivation  for  identifying  the  more 
general  ARMA  models. 

Traditionally  the  emphasis  in  identification  of  ARMA  models  has  been  on 
approximate  representations  (such  as  "long  ARs")  that  lead  to  linear  iden¬ 
tification  procedures.  However,  more  recently  there  has  been  a  flurry  of 
activity  in  exact  maximum  likelihood  formulations  and  nonlinear  optimiza¬ 
tion  procedures. 

[Box  and  Jenkins,  1976]  developed  the  familiar  conditional  sum  of  squares 
for  the  Identification  of  univariate  MA  processes  and  treated  the  ARMA 
case  as  a  special  MA  of  infinite  dimension.  This  method  was  later  general¬ 
ized  by  [Newbold,  1974]  and  [Ali,  1977]  to  mixed  processes.  They  obtained 
expressions  for  the  inverse  and  determinant  of  the  sample  correlation 
matrix  from  which  the  exact  likelihood  could  be  computed.  [Osborn,  1976] 
applied  the  same  approach  to  the  case  of  multivariate  moving  average  pro¬ 
cesses. 

Using  a  different  type  of  linear  transformation  of  the  Input  white  noise 
sequence,  [Phadke  and  Kedem,  1978]  showed  how  to  obtain  exact  likelihood 
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for  a  pure  moving  average  process  by  Cholesky  decomposition  of  the  corre¬ 
lation  matrix.  This  approach  was  generalized  to  the  mixed  models  case  by 
[Ansley,  1979],  who  presented  the  first  really  efficient  algorithm. 

[Akaike,  1973]  and  [Anderson,  1978]  have  tried  to  obtain  an  analytical 
solution  to  the  problem  of  exact  maximum  likelihood  for  vector  ARMA  pro¬ 
cesses.  [Akaike,  1973]  formulates  the  problem  directly  as  the  identifica¬ 
tion  of  a  Gaussian  model  by  numerical  maximization  of  the  Gaussian  likeli¬ 
hood  function.  Following  [Kashyap,  1970],  he  concentrates  on  obtaining 
expressions  for  the  gradient  and  Hessian  of  the  log  likelihood  function  to 
be  used  In  a  Newton-Raphson  type  non-linear  procedure.  All  these  methods 
are  In  fact  approximate  In  the  sense  that  they  consider  the  conditional 
likelihood  with  conditioning  on  some  fixed  Initial  conditions.  [Anderson, 
1978]  and  later  [Arhabi,  1978,  1979]  further  developed  the  method  and 
expressed  the  likelihood  function  In  terms  of  the  ratio  of  the  periodogram 
to  the  spectral  density  function  of  the  modi?! .  Along  these  lines,  the 
following  work  Is  also  relevant:  [Tretter  and  Steiglitz,  1967],  [Tunni- 
cliffe  Wilson,  1973],  [Dunsmuir  and  Hannan,  1976],  [Shaman,  1973],  [Gal¬ 
braith  and  Galbraith,  1974]. 

All  the  preceeding  methods  can  be  regarded  as  computationally  impractical. 
The  success  of  the  maximum  likelihood  theory  as  an  identification  proce¬ 
dure  for  ARMA  processes  Is  directly  tied  up  with  the  ability  to  efficiently 
compute  the  likelihood  function. 

An  alternative  representation  of  an  ARMA  process  is  the  Markovian  represen¬ 
tation,  Introduced  by  [Akaike,  1974].  As  early  as  1965,  [Schweppe,  1965] 
had  Indicated  how  Kalman  filtering  theory  could  be  used  to  get  the  exact 
likelihood  function  In  the  scalar  case.  Later  [Harvey  and  Phillips,  1979] 
further  developed  the  theme.  The  method  has  been  adapted  to  processes 
with  missing  data  In  a  very  useful  paper  by  [Jones,  1980].  Recently 
[Gueguen  and  Scharf,  1980],  using  a  somewhat  different  approach,  based  on 
the  Innovations  representation  of  an  ARMA  time  series,  have  given  new  and 
Interesting  filtering  Interpretations.  These  Interpretations  show  the 
connection  between  Markovian  and  innovations  representations  of  a  time 
series  and  show  how  the  Kalman  gain  vector  is  related  to  the  impulse  re¬ 
sponses  of  increasing  order  autoregressive  models  fitted  on  the  data. 
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In  this  paper  we  generalize  these  concepts  to  vector  autoregressive  moving 
average  models  and  obtain  efficient  recursive  procedures  to  compute  the 
exact  likelihood  function  In  fast  Kalman  predictor  and  fast  lattice  forms. 

2.  EXACT  LIKELIHOOD  FOR  VECTOR  ARMA  PROCESSES 

We  assume  that  a  finite  set  of  observations  {yj, . . . ,yj}  on  a  d  dimensional, 
zero  mean,  wide  sense  stationary  random  process  {y^}  is  given.  The  problem 
to  solve  is  one  of  fitting  a  vector  ARMA  (p,q)  model,  q  <_  p, 

P  q 

I  a  v.  =  Z  b  u.  an  =  1,  for  all  t  (1) 

m=0  m  t-”  n*0  "  t'n  0 

to  the  data.  {u. }  is  an  input  d  dimensional,  zero  mean  Gaussian  white 
t  y 

noide  process  whose  (dxd)  correlation  matrix  is  Elu^u^)  =  W. 

The  likelihood  function  is  defined  as  the  joint  probability  density  of  the 
set  of  vector  data  ( y q ..... yjjj ) ,  evaluated  at  the  observations  and  param¬ 
eterized  by  the  ARMA  parameters.  Define  the  following  ((N+l)dxl)  observa¬ 
tion  vector 

y  s  [yg»  •  •  •  *y^ 

The  vector  ^  is  distributed  as  a  multivariate  N(0,R)  where  R  is  an 
(N+l )dx(N+l )d  block  Toeplitz  matrix: 


We  shall  derive  several  expressions  for  the  likelihood  function:  convention¬ 
al,  Markovian,  and  innovations. 

2.1.  Conventional  representation  of  the  likelihood.  The  (N+l )dx(N+l )d 
matrix  R  is  block  Toeplitz  symmetric  but  is  not  symmetric  Toeplitz.  The 
likelihood  function  takes  the  form  of  the  well  known  multivariate  normal 
density 
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L<*>  *  (2i)M)/2  lRl'1/2  “Pt-  R'V  (2) 

and  the  log-likelihood  reduces  to 

log(L(y))  *  -  1  log  2tt  -  1  log  |Rj-  j  uj  u  (3) 

u  .  **’«* 

This  is  the  vector  closed-form  often  encountered  in  the  literature  [Ansley, 
1979],  [Anderson,  1978].  It  is  highly  non-linear  in  terms  of  the  ARMA 
parameters  (a^ ,. . .  ,a  ,bg, . . . ,b  ,W) .  Several  approaches  can  be  considered 
to  maximize  this  expression.  The  conventional  direct  method  consists  in 
deriving  an  approximate  version  of  the  likelihood  by  conditioning  on  some 
initial  conditions. 


2.2.  Markovian  representation  of  the  likelihood.  An  alternative  approach 
is  to  use  the  structure  introduced  in  the  Markovian  representation  of  the 
process  (yt>.  A  stochastic  process  {yt>  is  said  to  exhibit  a  Markov  prop¬ 
erty  if  the  future  behavior  of  the  process  can  completely  be  described  by 
some  present  state  and  the  future  Input.  The  state  condenses  all  the 
information  of  the  present  and  past  of  the  process  i y 1 1 .  Assuming  the 
process  is  stable  and  minimum  phase,  [Akaike,  1974]  has  established  the 
equivalence  of  the  Markovian  and  ARMA  representations.  The  finiteness  of 
the  dimension  of  the  predictor  space  of  an  ARMA  process  is  the  fundamental 
characteristic  of  a  process  with  a  Markovian  representation. 


Internal  structure.  The  finite  basis  of  the  predictor  space  is  chosen  as 
the  state  of  the  process.  The  process  {yt>  has  the  Infinite  MA  represen¬ 
tation 


k*0 


\  ut-k 


(4) 


t-1 


where  u^  *  y^  -  y^ 

of  the  different  predictors  yt 
given  by 


is  the  Innovation  of  {y.}  at  time  t.  The  analysis 


t-1 


yt+l 


t-1 


.,  reveals  that  they  are 


y 


t+s 


t-1 


,yt+p-l  s  = 
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where  L  represents  a  linear  transformation.  The  predictor  space  is  then 
of  finite  dimension  p  and  the  vector  of  one-step  ahead  to  p-step  ahead 
predictions  forms  a  basis  for  the  predictor  space.  The  state  space  model 
we  obtain  in  the  following  paragraphs  is  based  on  the  concept  of  a  finite¬ 
dimensional  predictor  space.  It  is  slightly  different  from  the  one  given 
by  [Akaike,  1974]  since  our  definition  of  the  state  is  a  shifted  version 
of  his. 

Notice  from  the  infinite  MA  representation  (4)  that  the  predictors  are 
written. 


*t+1+l 


■  y 


t+i 


t-1 


+  ut-l  f°r  ’  —  0- 


(5) 


From  (1),  the  subsequent  relation  also  holds: 


rt+p 


s  y 


i*l 


i  ^t+p-i 


t-1 


+  h  u  . 
p  t-1 


(6) 


These  recursions  impose  an  internal  structure  on  the  process.  That  is,  if 
we  define  the  (dpxl)  state  vector  =  [y^ 


t-1 


’•••’  yt+p-iL  we °btain 


t-i 


xt  - A  xt-i  +  8  Vi 


The  vector  ut  *  y^ 


yt 


is  the  innovation  of  the  process  as  defined 
t-1 


earlier  and  has  a  multivariate  normal  density  with  zero  mean  and  covari¬ 
ance  matrix  W.  A  and  B  are  respectively  a  (dpxdp)  block  state  matrix  and 
a  (dpxd)  input  matrix,  defined  as  follows: 


A  * 

0  I 

•  0 

0 

and 

B  = 

V 

I 

_“ap  •  ■  •  •  _al_ 

.v 

Hence  the  stationary  stochastic  process  {y^}  is  represented  as  a  linear 
map  from  the  Input  innovation  process  (ut }  to  {y^J  via  a  Markovian  state 
space  structure 


xt  8  A  xt-l  +  B  Vl 
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yt  *  c  Xt  +  ut 


with  C  defined  as  the  output  (dxdp)  matrix  [I  0  . 


Initial  conditions.  The  wide-sense  statlonarity  of  the  process  {yt) 
imposes  the  condition  that  at  time  t=0  steady  state  has  been  achieved. 
This  means  the  time  origin  is  rejected  back  to  t*-»  and  the  initial  state 
Xq  is  a  zero  mean  (dpxl)  random  vector  with  probability  density  N(0,Qq). 
The  (dpxdp)  covariance  matrix  Qq  satisfies  the  Lyapunov  equation 

Qn  *  A  Qn  AT  +  B  W  BT 


First  order  descriptor.  The  first  order  descriptor  of  the  vector  process 
is  the  response  to  a  unit  pulse  applied  as  the  eth  element  of  the  vector 
input  u't.  The  resulting  output  corresponds  to  the  column  of  the  (dxd) 

impulse  response  matrix  ht.  The  impulse  response  matrix  is  directly 
obtained  by  replacing  the  input  vector  ut  by  a  (dxd)  input  diagonal  matrix 
At,  [Wolovich,  1974] 


—  diag 

.  .  1  t  =  0 
1  0  t  f  0 


The  first  order  descriptor  is  then 


for  all  t  <  0 


h0  8  1 

ht  «  CT  At-1  B 


t  =  0 
for  all  t  >  0 


Second  order  descriptor.  The  lagged  correlation  matrices  for  the  random 
process  {y^}  are  defined  by  rt  *  E(ytyJ).  Using  the  above  Markovian  repre¬ 
sentation  we  obtain 


-  9.8  - 


ft  *  E[(C  Xt  +  Ut)(C  Xg  Uq)  ^ 

T  t  T  t-1 

*  C  Ax  Q0  C  +  C  Ax  B  W 

That  Is,  the  correlation  matrix  for  lag  t  is 

r  *  CT  At  Q0  C  +  ht  W  .  t  >  0 

These  equations  are  straightforward  generalizations  of  the  scalar  version 
[Gueguen/Scharf ,  1980], 


Like! ihood.  In  this  representation,  the  stationary  sequence  of  prediction 
residuals  (u^. }  may  be  written 


This  sequence  is  a  sequence  of  i.i.d.  random  vectors  with  normal  distri¬ 
bution,  N(0,W).  The  joint  distribution  of  the  input  vector  =  (uj,..., 
uj)  is  then  readily  written  as  the  product  of  normal  distributions, 

N 

f(un,...,uN)  *  it  N  (0,W) 
u  n  t-0  ut 

These  residuals  may  be  computed  causally  from  the  time  series  values 
£  =  (yQ,...,y^)  provided  the  initial  state  is  given: 


t-1 


0-1 


yt  -  yt 
cT  x* 


t-i 


A  xt-i  +  B  ut-i 

cTxn 


Thus  we  may  write  the  conditional  likelihood  function 


L(*/X0) 


N 

ir  f{ut  +  yt 
t-0  x 


N 

/Xn}  -  it 
t-1  u  t-0 


K  (y* 


‘t-i 


W) 


(10) 


The  exact  likelihood  function  of  the  process  is  then  obtained  by  integrat¬ 
ing  over  all  the  realizations  of  the  random  initial  state  XQ  which  has  a 
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normal  density  N(0,Qq).  Then 

N 

L (*)  *  /  ■  N  (y  ,  W)  N.  (0,Qq)  d  Xn  .  (11) 

all  XQ  t«0  yt  *  t-1  xo  u  0 

This  result  generalizes  the  scalar  result  of  [Gueguen/Scharf ,  1980].  It 
Is  of  theoretical  Interest  as  we  shall  see  later,  but  of  no  computational 
value.  A  maximum  likelihood  Identification  procedure  based  on  the  Markov¬ 
ian  representation  could  be  derived  using  the  conditional  likelihood  (10). 
An  Interesting  problem  then  is  one  of  choosing  the  appropriate  initial 
state  Xq  [Gueguen/Scharf,  1980]. 

2.3.  The  Innovations  Representation  of  Likelihood.  [Gueguen  and  Scharf, 
1980],  drawing  on  Ideas  from  [Anderson  and  Moore,  1978],  showed  how  a 
linear  time  varying  predictor,  or  innovations,  representation  could  be 
derived  from  the  Markovian  representation.  The  essential  advantage  of 
such  a  representation  is  that  the  Initial  predictor  state  is  no  longer  a 
random  vector,  but  is  rather  an  identically  zero  vector.  These  ideas 
generalize  to  the  multivariate  case.  This  feature  is  of  inestimable  value 
in  likelihood  theory,  as  we  shall  see. 


Internal  Structure.  We  are  looking  for  a  model  that  will  produce  an  out¬ 
put  yt  that  has  the  same  statistical  properties  as  the  original  ARMA  pro¬ 
cess.  The  innovations  model  maps,  some  time-varying  zero-mean,  Gaussian 
white-noise  (u  }  into  {y  }  through  a  time  varying  structure.  The  states 

are  still  defined  as  a  (dpxl)  vector  of  predictors,  X;  *  [y 

1  1  t-1 

3,  but  now  they  evolve  according  to  the  time  varying  state  equa- 


vT 

yt+p-l 

tion. 


t-1 


Xt  *  A  Xt.1 


+  k 


t-1  ut-l 


(12) 


Here  kt  Is  a  (dpxd)  time  varying  Kalman  block  vector  that  replaces  the 
time  Invariance  B  of  the  Markovian  representation: 


k 


T 

t 


* 


and  (ut>  is  a  multivariate  nonstationary  N(0,wt)  innovation  or  prediction 
residual  that  replaces  the  stationary  residual  sequence  {ut>  in  the 
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Markovian  representation.  wt  *  E[utu^]  is  a  (dxd)  zero-lag  correlation 
matrix.  The  output  Is  then  given  by  the  Markovian  measurement  equation 


CT  Xt  ♦  ut 


Initial  conditions.  The  origin  of  time  may  be  brought  back  to  t=0  and  the 
initial  conditions  may  now  be  given  deterministically  as 


Qq  =  E[XqXq]  =  0 


(13) 


First  order  descriptor.  Substituting  the  diagonal  matrix  a  in  place  of 
the  vector  u^  leads  to  the  following  expression  for  the  impulse  response: 

h*  =  0  for  all  i  <  0 

h*  =  I  i  =  0  (14) 

ht  «  CT  A1"1  kt  for  all  i  >  0 

The  term  h*  is  the  response  of  the  representation  (12)  at  time  t+i  to  an 
input  At  that  applies  a  diagonal  input  at  time  t. 


Second  order  descriptor.  The  (dxd)  matrix  correlation  matrix  sequence 
{r*}  of  the  output  process  is  readily  obtained  as  follows: 

(15) 

=  CT  A1  P  C  +  CT  A1"1  k  wt 

where  *  E[XtX^]  is  the  (dpxdp)  zero-lag  state  correlation  and  is  charac 
terized  by  the  Lyapunov  equation 


t+1 


A  Pt+1  A  +  kt  wt  kt 


The  right  choice  for  the  Kalman  vector  k.  and  the  correlation  matrix  w. 
can  make  { r^ }  become  time  invariant. 


Choose  k^  and  w^  such  that 

ktwt  =  -A  Pt  C  +  A  Qq  C  +  BW 
and 


The  nice  result  here  is  the  following:  if  one  is  interested  only  in  the 
second  order  properties  of  a  stationary  vector  process,  one  can  replace 
the  time-invariant  Markovian  representation  by  a  time  varying  innovations 
representation  which  gives  the  same  mean  value  and  correlation  sequence 
and  whose  advantage  is  that  the  initial  conditions  are  deterministically 
set  at  t=0. 


Likelihood.  Now  we  use  the  important  property  that  the  initial  state  in 
the  innovations  representation  is  set  at  zero.  The  innovations  process  is 
then 


ut  =  h  ’ 


N(0,wt) 


Thus  the  vector  of  observation  ^  will  have  the  same  distribution  as  the 

input  vector  u^  with  the  different  mean  (yl  n  )  *  yT. 

*  t-1  1  t-1 

Hence  jr  is  distributed  as  N(£,J^),  where  is  the  d(N+l  )xd(N+l )  diagonal 
matrix  nN  *  diag(wN, . . . ,wQ) .  As  a  result  the  likelihood  function  is 
expressed  as  a  finite  product  of  normal  N  (y  ,w.)  densities: 

y*  ^  ^ 


L(jr)  *  u  N  (yt  ,w  } 

4.  _  Jr  4.  »  *  1  *• 


The  log  likelihood  of  the  observations  is  formulated  in  the  favorable 
form. 


W 1 


In  this  form,  the  log  likelihood  function  depends  only  on  the  innovations 
values  u^  and  variance  values  w^.  Both  and  are  non-stationary 
sequences  that  have  a  finite  time  dependence,  and  are  given  at  each  time  t 
by  the  Kalman  filter  defined  for  the  innovations  representation.  This 
means  that  we  have  now  a  recursive  way  of  calculating  the  exact  likelihood 
function  of  vector  ARMA  processes.  The  values  so  obtained  are  then  fed 
into  a  non-linear  optimization  procedure  of  the  Newton-Raphson  type  that 
provides  an  optimal  set  of  parameters.  This  procedure  is  repeated  till 
convergence  is  achieved,  and  maximum  likelihood  estimates  of  the  ARMA 
parameters  obtained.  This  supposes  that  the  orders  (p,q)  of  the  ARMA  has 
been  determined. 


2.4.  Innovations  representation  and  an  important  Chapman-Kolmogorov 
equation.  The  two  previous  paragraphs  have  been  devoted  to  the  derivation 
of  the  likelihood  function  of  the  process  {yt } .  By  comparing  the  likeli¬ 
hood  expressions  (11)  and  (16),  obtained  respectively  for  the  Markovian 
and  innovations  representations,  we  see  that  a  very  important  Chapman- 
Kolmogorov  equation  has  been  solved: 


N 

IT 

t=0 


N  (y* 
y  Jt 


N 

/  * 
all  XQ  t*0 


N  (yt  ,W)N  (0,Q  )dX 
yt  1 1-1  X0  U  U 


07) 


*t  s  A  *t-l  +  ktut-i  ;  E  0 
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yt  -  cT  *t  i  %-i  :  n(cT  xo-  cT  qo  c! 

I 

xt  “'t*  But-i  i  x0  •  N<o.Q0) 


2.5.  Comments .  The  maximum  likelihood  identification  procedure  may  be 
conducted  in  the  following  way.  First  start  with  an  initial  guess  of  the 
ARMA  parameters  and  compute  the  corresponding  correlation  sequence.  (In 
the  scalar  case,  [Dugre/Beex/Scharf ,  1980]  proposes  an  algorithm.)  Then 
run  the  Kalman  filter  to  obtain  the  value  of  the  exact  likelihood  function. 
This  value  is  fed  into  a  non-linear  optimization  procedure  that  updates 
the  vectors  of  parameters. 

One  may  want  to  speed  up  the  computations  involved  in  Kalman  filtering. 
Appealing  to  the  formulae  (14),  it  is  seen  that  there  exists  a  close 
relationship  between  the  time  varying  impulse  response  sequence  {h*}  and 
the  vector  Kalman  gain.  We  shall  use  this  feature  extensively  in  the  next 
section  to  derive  a  Fast  Kalman  Algorithm  (Morf,  Sidhu,  and  Kailath  algo¬ 
rithm)  that  will  avoid  the  solution  of  the  Ricatti  equation.  This  also 
leads  to  a  fast  lattice  implementation. 

3.  FAST  ALGORITHMS 


Recall  the  expressions  of  the  exact  likelihood  (3)  and  (16)  obtained  by 
the  conventional  method  and  the  innovations  method.  It  is  clear  that 

N 

Log | R  J  =  I  logjw. | 
t-0 

and 


1  N 

2  ^  ^t”^t 

C  t=0  Z  Z 


>V<vxtl 


Thus  the  innovations  representation  solve  the  triangularization  of  the 
inverse  correlation  matrix  to  obtain  the  white  vector 


u  .  R-1'2  *  . 


-1  /2 

or  since  R  '  Is  Invertible 
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*  =  R1/2  u 


But  the  triangular  or  Choleski  decomposition  is  easily  obtained  from  the 
innovations  representation  (12).  If  we  write  the  observation  vector 
-  [yj,...,yj]  upside  down,  we  obtain  the  upper  triangular  block  matrix 
equation: 


Here  the  upper  triangular  block  matrix  Kw  is  (N+l)d  x  (N+l)d  dimensional 

i  N 

The  block  elements  {h^}  of  the  matrix  are  defined  as  the  time  varying 
impulse  response, 

h*  =  CT  A1"1  kt 
h0  *  1 

The  block  correlation  matrix  R  is  computed  as  the  expected  value  of  the 
outer  product  of  the  vector  of  observations: 


R  *  E  {  •  [yl  . . .  yll} 
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Thus  the  ((N+l)d  x  (N+l)d)  dimensional  block  correlation  matrix  is  decom¬ 
posed  into  the  following  triangular  form: 


R  =  KN  E  {  1  ^UN  •••  uq]}kJ 


That  is 


ro  ri  •  • 
r-i  ro 


ro  r1 
r-i  rc 


hN  1  fWN 


wo  (hS>T  1 


Thus  we  may  write 

R  -  Kn  %  -  r1/2r1/2T 


r^  £  /p  0/2 

whpre  nN  is  the  diagonal  block  matrix  «  diag(wN  ...  wQ).  The  triangular 
matrix  is  invertible.  Therefore  we  may  write  (20)  in  the  form 

K'1  R  *  nN  kJ  (21) 

By  analogy  with  the  scalar  case,  these  equations  are  called  the  multivari¬ 
ate  normal  equations.  The  inverse  of  the  matrix  K^,  is  redefined  as  a^: 

fi  1 


-1 

aw  *  kn 


I  a 


I  -I 


Now  the  analogy  with  the  scalar  case  is  much  clearer.  The  block  vector 

[I  a!J* . ajj3  represents  the  vector  of  parameters  of  an  increasing  order 

vector  autoregressive  process  fitted  on  the  correlation  matrix  of  the  pro¬ 
cess.  This  problem  is  solved  in  the  scalar  case  by  the  Levinson-Durbin 
algorithm  [Levinson,  1947],  [Durbin,  I960].  Equivalently,  one  might  solve 
(21)  directly  for  the  time  varying  impulse  response  ht  using  the  impulse 
response  algorithm  [Leroux/Gueguen,  1977].  [Robinson  and  Wiggins,  1965] 
entended  the  Levinson  algorithm  to  the  multivariate  case. 

3.1.  The  Generalized  Impulse  Response  Algorithm.  Here  we  want  to  derive 
an  algorithm  to  obtain  directly  the  matrix  impulse  responses  h^  that 
appear  in  the  matrix  K^.  It  must  not  be  forgotten  that  we  are  after  a 
fast  algorithm  to  calculate  the  block  Kalman  gain  vector  kt.  From  rela¬ 
tion  (14),  it  is  seen  that  the  matrix  impulse  response  sequence  is  given 
by. 


where  =  [I  0,...,0]  and  A  is  the  block  companion  matrix  given  in  Sec¬ 
tion  2.2.  This  means  that  the  block  Kalman  gain  vector  is  composed  of 
the  first  p  impulse  response  matrices; 

=  CCh^)T. - (hP)T] 

The  generalized  impulse  response  algorithm  provides  a  recursive  method 
for  calculating  the  Kalman  gain  vector. 

Let's  write  the  forward  and  backward  predictors  for  orders  p=0  to  N; 


Forward  Predi ction 


k  r  k  T 

Here  S^*R^(h*)  and  to  connect  the  algorithm  to  the  solution  of  (21),  one 
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should  have  In  mind  that  R^«w^.  The  order  p  Is  the  order  of  the  AR  part 
of  the  ARMA  ( p , q) . 


Backward  Prediction 


K  nr.*  r>„  .,1  r«V»V*V, 


r-!  '-I’o 


•-r-2N  r-N-l  Lr-N 


VN-1  yN-1 

*0  1 


'0-1  V-p-  •  V?! 


From  the  matrix  relation  (23),  the  following  results  hold: 


J  N 

1  &l  r-Ul 
JL=0  1  1 


for  all  N 


But  the  expression  for  the  difference  a^  -  a^"^  is  given  by  the  [Robinson- 
Wiggins  algorithm,  1965]  and  hence  substituting  this  expression  in  (25) 
yields 


r„r  -,-1  "  -N-l 

“  ~aN-l ^RN-1  ^  BN-£  r-i-£ 

=  I  for  all  N 


We  recognize  from  (24)  that  the  sum  is  exactly 


VN-1  «  t  hN-1  r 

V-1-N  Z_  bN-£  f-1-£ 
£*1 


Therefore  the  elements  satisfy  the  recursion 


cN  ,  -N-l  rer  V1  ..N-l 

S1  S1  ‘  VltVll  V-i-N 


i  >  0 


The  same  derivation  is  applied  to  and  it  is  easily  shown  that: 


VN  -  VN‘1  -  s  FRe  r1 
V-N-1-l  V-N-i-l  °  N-l  ‘•n-l J  Si 


1  >  0 


(28) 
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The  equations  given  Rj+1  and  Rjjj+1  do  not  need  to  be  changed  and 
remain  valid  provided  that  we  can  get  an  expression  for  aN  *  BN  that  does 
not  Include  either  the  forward  or  the  backward  coefficients.  From  the 
Robinson-Wiggins  algorithm  we  know  that 

aN  *  °N-1[RN-1]  1  (2g) 

bN  "  -fiN-lCRN-l3  1 

From  (22)  and  (23),  one  can  extract  the  expression  for  bjj  given  here: 

hN  «  R  TRe  T1  *  -VN"1rsN“1T1  .  (30 ) 

bN  '6N-1LRN-1J  V-N  L^0  J 

N 

The  same  procedure  leads  to  a  similar  expression  for  a^: 

N  ,  „  rRr  -|-1  .  sN-lrvN-l-|-l  (31 

aN  ■aN-ltRN-lJ  5-N  LV0  J 

The  generalized  impulse  response  algorithm  is  then  summarized  as  follows: 


<-N  .  cN-1  fRr  rl  yN-l 
si  si  “N-1lkN-1j  -N 


for  all  i 


vi  ’  Vl'’  -6N-1[RN-Ir1  S-ilH  for  811  1 


RN  *  RN-1  aN 

RN  *  RN-1  "W  eN 


rRr  rl  «  sN-l[vN-lrl 
aN-lLRN-lJ  -N  LV0  J 

rRe  r1  -  v"“1csN-1r1 

6N-1LRN-1J  v-n  L  0  J 

Initial  Conditions.  The  Initial  conditions  are  straightforward  and  given 


[Vq» • • • »Vfl]  *  [rQ» • • • »rf^ 

[sj,...,s5]  *  [r_N,....r0] 


In  the  scalar  case  It  reduces  to  the  Impulse  response  algorithm  proposed 
by  [Leroux/Gueguen,  1977].  By  analogy,  the  terms  aN_-| CR^_i 3”  and 
eN-1^RN-l^  will  *>e  called  multivariate  forward  and  backward  reflection 
coefficients  (or  parcor  coefficients). 

Remarks.  This  algorithm,  although  a  recursive  way  of  computing  the  Im¬ 
pulse  response  of  the  Increasing  order  vector  AR  processes,  does  not 
Improve  the  number  of  computations  a  great  deal.  In  fact.  If  one  wants 

to  obtain  the  Kalman  vector  kJ  «  C(h![,)T . (hj)1],  one  has  to  start  the 

algorithm  with  the  knowledge  of  the  matrix  correlation  sequence  for  -n-p-1 
£l<p.  This  Interval  depends  on  N. 

So  far  the  generalized  impulse  response  algorithm  has  been  derived  with¬ 
out  using  the  Information  we  have  of  the  Internal  structure  for  the  pro¬ 
cess  . 


3.2.  Generalized  Impulse  Response  Algorithm  for  ARMA  Processes:  The 
Morf,  Sldhu  and  Kailath  (MSK)  Algorithm.  We  show  here  that  the  general¬ 
ized  Impulse  response  algorithm,  modified  to  take  account  of  the  ARMA 
structure,  is  Identical  to  the  [Morf,  Sidhu  and  Kailath,  1974]  algorithm 
derived  from  the  Chandrasekar  type  equations.  As  the  process  {y^. }  is 
ARMA,  It  behaves  like  an  AR  on  Its  tail.  That  is,  for  N>0,  the  matrix 
correlation  sequence  {rt>  satisfies  the  AR(p)  recursion, 

rN+l+p  +  al  rN+p  +  *•*  +  ap  rN+l  *  0  ^or  a11  N  -  °* 

Here  the  {a^ }  are  the  true  parameters  of  the  autoregression.  After  trans¬ 
position,  equation  (33)  becomes, 

(34) 


-N-l-p 


-N-l 


,r 


N-pJ 


to  r 


-N-l-p 


If  we  write  this  equation  for  r  .j 
equation  Is  true: 


,  then  the  following  matrix 
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r-  N- p- 1 


-1  r-2 

r-l 

-2 

r-p-l 

-N-l 

r-N-p 

Premultiply  by  the  row  block  vector  of  backward  prediction  coefficients 
and  obtain: 


r.N  kN  T1  UN 

CbN, . . . »b1  I]  r_p_2  ■  -  v_N-p-l 


Lr-N-p-lJ 

Therefore  the  Internal  structure  imposed  by  the  ARMA  nature  of  the  pro¬ 
cess  is  characterized  by  the  relation 


-N-p-1 


rvN  1 

LV-N-l  v-N-2-V-N-pJ 


l_'al 


N  N 

This  indicates  that  we  need  only  to  compute  p  values  V  N  ^  to  V  N  .  The 
initial  conditions  are  now  given  by  {r..}+P,  independent  of  N.  This  obser¬ 
vation  summarizes  the  essence  of  the  Fast  Kalman  algorithm  and  underlies 
the  MSK  algorithm.  From  (37),  the  vector  [V^  ^  •••  VNN  p_^]  Is  given 
by  the  linear  transformation. 


rvN  vN  1  *  ry^  yN 

-N-2  •••  v-N-p-lJ  L-N-l  v-N-pJ* 


N  N 

where  A  Is  the  state  companion  matrix.  The  recursions  giving  and 
In  the  generalized  Impulse  response  algorithm  (32)  are  written  for  i»l, 
. . . ,  p : 


<N+1  .  ror-i-l  yN 

S1  S1  ”VRN^  V-N-i-l 

yN+l  B  yN  g  rRe-|-l 

V-N-1-l  v-N-1-leNLKNJ  51 
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Define  now  the  following  (dxdp)  dimensional  block  vectors 


(s|)T- 

(Sp>T_ 

and  V.  ■ 

— t 

(V*.t.p)T. 

Then  the  set  of  equations  (39)  is  condensed  Into  the  vector  form 


^N+l 

VT 

-£n+1 


_aN^RN^  ^ 

Vfl  at  -bn[rJ]  1  kJ 


(40) 


It  is  also  noted  that  *  s!!, 

n  o 

with  the  expressions  for  ^[R^] 


and  R, 
-1 


r 
N 
and 


■  Vq.  These  identities, 
given  in  (32), 


together 
prove  that: 


N1]1  *  CT  ^  CT  =  [I  0 . 0] 

>vj  c-s 


Finally,  for  purposes  of  easy  comparison  with  the  MSK  algorithm,  define 
the  matrix  Mk  as  the  inverse  of  the  matrix  of  mean  square  backward  pre¬ 
diction  errors: 

\  *  Kr1 

Then  the  generalized  impulse  response  algorithm  for  a  vector  ARMA  process 
is  summarized  In  these  equations: 

*N+1  'A  V„  Vj  C 

^N+l  *  A  "£n^RIP  T  [T  (42) 

♦  CT  vj  c 

[«Kt,]T-[MN]T-[HNfvJctR5r1CTV(([HN]T  . 

This  algorithm  Is  readily  recognized  as  one  form  of  the  MSK  algorithm 
[Morf/SIdhu/Kallath,  1974],  [Friedlander  et  al .  1978],  applied  on  the 
Innovations  state  space  model.  To  complete  the  Identification  the  reader 
should  note  that  the  matrix  CT  and  the  block  vector  \L  In  (42)  correspond 
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respectively  to  the  matrix  H  and  the  block  vector  In  the  MSK  algorithm. 
The  number  of  computations  Is  significantly  reduced  compared  to  the  direct 
solution  of  the  Riccati  equation  or  to  the  generalized  Impulse  response 
algorithm.  It  should  be  clear  that  a  number  of  operations  can  be  per¬ 
formed  In  parallel. 

Remarks .  Recall  that  we  are  after  an  algorithm  to  compute  the  block  Kal¬ 
man  gain  vector  in  order  to  calculate  the  exact  likelihood  function.  The 
vector  Kt  Introduced  in  (40)  is  defined  as 


where  k^  is  the  block  Kalman  gain  vector  we  want  and  fi  is  the  block 
diagonal  correlation  matrix  of  the  Innovations  process,  fit*diag(wt , . . . ,wQ). 

The  same  derivation  can  be  worked  out  in  the  scalar  case  starting  from  the 
scalar  Impulse  response  algorithm.  It  leads  to  the  scalar  version  of  the 
MSK  algorithm  used  by  [Pearlman,  1980]. 

The  generalized  Impulse  response  algorithm  or  the  fast  Kalman  algorithm 
are  also  fast  algorithms  to  generate  the  generalized  reflexion  coefficients 
aN-l C^N-1  anc*  eN-l^RN  1^~  ‘  ^ence  the  method  can  be  implemented  as  a 
fast  lattice  as  well . 

4.  CONCLUSIONS 

In  this  paper  we  have  derived  a  recursive  procedure  to  compute  the  exact 
likelihood  for  vector  ARMA  processes.  The  key  to  the  method  was  the 
"Innovations"  representation  of  the  process  that  allowed  the  use  of  Kalman 
filtering  techniques.  The  Kalman  vector  gain  was  shown  to  be  composed  of 
time  varying  Impulse  response  values  of  the  process.  This  motivated  the 
derivation  of  the  generalized  Impulse  response  algorithm.  Finally,  intro¬ 
ducing  the  knowledge  of  the  Internal  structure  of  the  ARMA  process,  we 
showed  that  the  generalized  Impulse  response  algorithm  was  Identical  to  the 
MSK  or  fast  Kalman  algorithm  derived  from  the  Chandrasekar  type  equations. 
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ABSTRACT 

A  general  framework  for  deriving  and  interpreting  analysis  and  synthesis 
spectra  of  the  autoregressive  (AR)  and  moving  average  (MA)  type  is  reviewed. 
Linear  transformations  of  finite  data  records  provide  for  a  unified  treatment 
of  spectra  as  different  as  the  Bartlett  (BA),  the  maximum  likelihood  method 
(MLM)  and  the  maximum  entropy  method  (MEM)  spectra.  We  then  generalize  these 
ideas  to  ARMA  linear  transformations.  An  ARMA  type  spectrum  is  obtained  and  re¬ 
lated  to  other  ARMA  spectra.  A  parameterization  scheme  is  proposed. 

INTRODUCTION 

In  many  applications  of  signal  processing  such  as  sonar,  radar,  speech,  and 
communications,  only  short  segments  of  the  processes  are  available.  This  situ¬ 
ation  arises  due  either  to  inherent  nonstationarities  that  force  segmentation  or 
to  the  short  interval  over  which  the  signal  can  be  observed.  Hence  the  problem 
of  estimating  spectral  density  functions  from  a  finite  set  of  time  series  ob¬ 
servations  is  a  crucial  step  in  the  modeling  of  underlying  data.  Current  re¬ 
search  activity  is  primarily  centered  on  "high  resolution"  or  parametric  methods 
of  spectrum  analysis  (cf  ex:  (l)-(4)).  Recently  a  general  framework  for  deriving 
and  interpreting  analysis  and  synthesis  spectra  of  the  autoregressive  (AR)  and 
moving  average  (MA)  type  has  been  introduced  (5) (6). 

In  most  techniques,  stationary  time  series  are  modeled  as  the  output  of  time 
Invariant  linear  systems  (filters)  driven  by  stationary  white  noise  sequences. 
When  dealing  with  finite  length  data  records  the  problem  of  initial  conditions 
arises  and  is  generally  avoided  by  making  some  assumptions  on  the  data  outside 


Che  interval  of  observation. 

Alternatively,  the  proper  initialization  may  be  obtained  by  identifying  time 
varying  models  where  the  initial  conditions  manifest  themselves  as  time  vari¬ 
ations.  In  this  approach,  the  vector  of  data  is  viewed  as  the  result  of  a 
linear  transformation  applied  to  a  vector  of  uncorrelated  values.  This  formu¬ 
lation  leads  to  a  unified  presentation  and  interesting  interpretations  of  spec¬ 
tral  estimates  such  as  the  Bartlett  (BA),  maximum  likelihood  method  (MLM)  and 
maximum  entropy  method  (MEM)  spectra  (5)£).  Depending  on  the  type  of  data  being 
analyzed  we  are  able  to  improve  the  performance  of  the  BA  and  MLM  spectra  using 
a  parameterization  procedure.  It  appears  that  these  procedures  provide  a  method 
of  classifying  data  as  AR,  MA  or  ARMA.  These  results  are  briefly  reviewed  in 
the  first  part  of  this  paper. 

We  generalize  these  concepts  to  the  definition  of  an  ARMA  linear  transfor¬ 
mation  on  finite  length  data  records  using  results  from  (7)  and  the  statistics  ' 
literature  (8),  (9).  This  leads  us  to  the  derivation  of  an  ARMA  spectral  esti¬ 
mate  analogous  to  the  MLM  spectrum  in  the  AR  case  or  the  BA  spectrum  in  the  MA 
case.  Its  relationship  to  "high  resolution"  ARMA  spectra  is  also  discussed.  As 
in  the  AR  and  MA  case,  it  is  speculated  that  an  appropriate  parameterization 
procedure  would  improve  the  performance  of  such  a  spectral  estimate,  at  least 
for  ARMA  type  data. 

AR  AND  MA  LINEAR  TRANSFORMATIONS 

T 

Let  ■  (yQ . yfc_^)  denote  a  t-sample  snapshot  of  the  real,  zero  mean, 

wide  sense  stationary  process  (yfc )  with  correlation  sequence  (rt>.  Yt  has  a 

symmetric  and  Toeplitz  correlation  matrix  with  first  row  (ro,  . rt-l^ '  ^et 

T 

U  *  (u  . ...u_  ,)  be  a  white  vector  with  uncorrelated  entries  such  that 
t  o  t-l 

T 

E(Ut)  »  0  and  E  (UtU£)  ■  Ifc  (txt  identity).  The  AR  and  MA  linear  transformations 
are  defined  respectively  by  the  matrix  equations 


where  the  lower  triangular  matrices  A£  and  Ht  are  obtained  by  a  Gram-Schmidt 
orthogonalization  procedure.  These  matrices  can  be  computed  from  the  cor¬ 
relation  matrix  or  its  inverse  since  the  following  Cholesky  decompositions  hold 
and  are  unique: 


-1  T 
V  -  At  At 

AtHt  -  Tt 

Generalizing  the  linear  system  concept  of  frequency  response,  we  define  the 
"frequency  response"  of  the  AR  and  MA  linear  transformations  in  a  natural  way 


At  (  a)  )  *  At  C£  (  w  ) 
Ht(  w)  -  C*(u)  Ht 


C«(o>)  -  (1  ejW _ ejW  (t_1)) 


Case:  As  A£  is  a  whitening  transformation,  we  associate  with  the  AR  complex 


frequency  response  the  spectrum 


Rt(w)  ■ 


I  At(u)  |; 


C*(w)  R”1  Ct(u) 


This  spectrum  is  recognized  as  the  maximum  likelihood  method  spectrum.  In  the 
case  of  AR(p)  data,  the  AR  linear  transformation  A  is  A  : 


Superscript  (H)T  denotes  (Hermitian)  transpose. 


In  general  the  AR  transformation  matrix  A  can  be  approximated  by  A  where 

t  t ,  p 

the  matrix  is  completed  by  repetition  of  the  order  -p  row.  The  associated 
spectrum  is  called  a  parameterized  MLM  spectrum  and  has  been  shown  to  yield 
good  spectral  estimate's  in  the  case  of  AR  type  data.  (See  (5),  (6),  and  Figure  1 
of  this  paper) . 

MA  case:  The  spectrum  associated  with  the  MA  linear  transformation  is 
Rt(u>)  -  |Ht(w)|2  -  C*  <u>)  HtH^Ct(iD). 

This  spectrum  is  a  scaled  version  of  the  conventional  or  BA  spectrum  as  written 


in  (10)  •  If  the  data  were  MA(q),  then  the  MA  linear  transformation  matrix  H 


t,q 


would  be  banded  even  though  the  rows  would  remain  time  varying: 


H. 


\ 


'  hq 


0  V 


'  hq 
o 


H 


t,q 


A  parameterization  scheme  H  insuring  non-negative  spectra  is  proposed  and 

t,q 

associated  with  a  parameterized  BA  spectrum  in  (6).  (See  Figure  2). 


ARMA  LINEAR  TRANSFORMATIONS 


Suppose  the  underlying  process  (y£)  is  ARMA  (p,q): 

P  q 

Z  S1  Vj  “  1  b1  £t-j 
j-o  J  J  J-o  J  J 


Here  a  ■  b  -  1  and  £  is  a  white  noise  process.  The  snapshot  y„  has  a  cor 
o  o  t  t 

relation  matrix  R^  that  has  the  LU  decomposition 
*Y-HtHt  * 

We  define  the  AR  part  of  (y£)  as  the  AR  process  (z£)  with  coefficients  (aQ** 


A  snapshot  of  the  process  Z£  satisfies  the  AR  linear  transformation 


A 

t,p  t 


U 


where  U  is  a  vector  of  uncorrelated  data  and  A  is  given  by  the  Cholesky 
t  t,p 


decomposition 


-1  T 

R„  -  A  A 
Z  t,p  t,p 


Recall  A^  has  the  banded  structure  illustrated  earlier.  Applying  this  AR 

t,P 


linear  transformation  to  the  snapshot  Y£,  produces  a  finite  length  vector  X£ 
whose  correlation  matrix  Rx  has  the  following  structure  (7),  (8),  and  (9): 


*x 


At,p  Yt 


A  R  A" 
t,p  T  t,p 


The  matrix  R^  is  banded  Toeplitz  except  for  the  upper  left  (p  x  p)  corner. 
Thus  R^  is  almost  the  (t  x  t)  correlation  matrix  of  a  pure  MA(q).  However, 
R^  has  an  L  U  Cholesky  decomposition  as  follows: 


*x  -  Bx  Bx 


B  has  the  particular  lower  triangular  structure  (8) (9), 


This  decomposition  corresponds  to  an  MA  linear  transformation  applied  to 


Et  to  produce  Xfc: 


B  . 

t,q  t 


Here  E^  is  a  vector  of  uncorrelated  data.  As  a  result  the  ARMA  linear  trans¬ 
formation  is  summarized  by  the  matrix  relations 

A  Y  *  X  *  B  E^ 
t,p  t  t  t,q  t 

A  H  ■  B 
t,p  t  t,q 

It  is  not  hard  to  show  that  Ht  is  an  impulse  response  matrix  determined  by  the 
Kalman  gain  (7).  This  fully  characterizes  B_  and  sharpens  the  result  of 

t»q 

Newton  and  Pagano  (8),  (9). 

It  should  be  noted  that  B  is  exactly  a  banded  matrix  for  q  ^  p-1.  The  ma- 

q 

trix  At  p  is  easily  constructed  from  (aQ  ...  a^)  by  running  the  Levinson/Durbin 
algorithm  for  decreasing  j , 


k 

a  ■  a 

%  i 


k  >_p,  o  <*<k 


*i  *  <*r  -  •]«.!  >(i-u3«>2)'1  j<p-  °-i-j 

It  is  also  shown  in  (8)  that  the  last  row  of  B_  converges  to  the  vector  of 

t,q 

true  MA  coefficients  as  t  goes  to  infinity: 

lim  b^  *  b  .  o  <i<a 


ARMA  Spectrum:  If  Sx  ( w)  and  ( w )  denote  the  spectra  corresponding  to 
and  Y£,  a  plausible  definition  of  (u)  is 


SY(  u) 

V“> 


sx(“> _ 

cH<“)aI,a.pc<“) 


Hence  an  ARMA  spectrum  has  the  following  form: 

CH(  w)B  bJ  C(  cj ) 


SY(w) 


CH(id)A l  A  C<  ta>) 
t,p  t,p 


cVc 
'V  c 


This  spectrum  reduces  to  the  MLM  (BA)  spectrum  in  the  event  of  AR  (MA)  data. 
ARMA  Spectrum  Analysis: 

The  method  is  based  on  the  initial  computation  of  the  AR  coefficients.  For 
t  >p+q,  these  coefficients  are  computed  using  the  lower  left  corner  of  zeros 


inRx 


In  fact  this  corresponds  to  solving  the  normal  equations  on  the  tail: 

P 

Z  a.  r  (m-k)  *  -r  ( m).  m  > q+1 
k-1  K  y  y 

The  method  can  be  summarized  in  the  following  steps. 

-  Estimate  the  correlation  matrix  ,  using  the  covariance  method  of 
linear  prediction,  for  example. 

-  Solve  for  (ao...ap),  using  the  normal  equations  on  the  tail. 


-  Decompose  Ry  ■  using  a  fast  Choslesky  decomposition  routine 

-  Compute  15  “A  H 

K  t,q  t,p  t 

-  Compute  S„  <o>  )  -  C*V  C/CHA*  C 

i  t,q  t,q  t,p  t,p 


In  the  limiting  case. 


tlft.  V“> 


CH  b  bT  C 
Z3L  Z3. 

a  aT  C 
“P-P 


T  T 

where  b  ■  (b  ....b  )  and  a  ■  (a  ....a  ) 

— q  q  q  — p  o  p  • 

Relationship  With  The  "High  Resolution"  ARMA  Spectrum;  The  "high  resolution 


ARMA  spectrum  (4)  can  be  written  using  our  notation  as 


S(  a)  ) 


S£  (  u» 
CHa  aTC 

-p  -p 


where  a^  is  the  vector  of  AR  coefficients  and  is  computed  by  solving  the  normal 
equations  on  the  tail,  corresponding  in  fact  to  using  the  lower  left  corner  of 
zeros  in  R^.  S£(to  )  is  the  smoothed  periodogram  of  the  filtered  process 
(efc)  k  *  1  ....t  : 

P 

ek  *  yk  +  ill  ai  yk-i 

This  time  invariant  filter  can  be  expected  to  whiten  the  data  less  than  the 
linear  transformation  Afc  p  and  hence  yield  a  process  far  from  a  pure  MA(q). 

The  spectrum  proposed  here  is  consistent  with  both  the  MLM  and  the  BA  spectrum 
Furthermore  it  is  fairly  flexible  since  one  can  apply  a  parameterization  pro¬ 
cedure  on  the  AR  part  or  on  the  MA  part  or  both.  Different  sizes  can  be  chosen 

to  extend  the  matrices  A  and  B  depending  on  the  relative  importance  of 

t  •  P  t ,  q 

the  poles  and  zeros  in  the  spectrum.  This  should  give  better  control  of  the 
compromise  between  resolution  and  smoothness  and  provide  a  tool  for  analysis  of 
spectra  of  the  ARMA  type.  For  the  moment,  these  results  are  conjecture  based 
on  the  experience  gained  for  AR  and  MA  linear  transformations. 
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ABSTRACT 

A  general  framework  for  deriving  and  in¬ 
terpreting  analysis  and  synthesis  spectra  of 
the  autoregressive  (AR)  and  moving  average 
(MA)  type  is  presented.  Investigation  of  AR 
linear  transformations  of  finite  dimensional 
data  records  yields  a  set  of  intermediate  AR 
techniques  associated  with  approximation  of 
the  inverse  correlation  matrix  R~l.  The 
corresponding  spectrum  we  call  a  parameter¬ 
ized  maximum  likelihood  method  (pMLM)  spec¬ 
trum.  Investigation  of  MA  linear  transfor¬ 
mations  yields  a  set  of  Intermediate  MA 
techniques  associated  with  approximation  of 
the  correlation  matrix  R.  The  corresponding 
spectrim  we  call  a  parameterized  Bartlett 
spectrum  (pBA). 

Simulations  on  synthetic  AR,  MA  and  ARMA 
data  sets  illustrate  the  techniques  and  lead 
to  interesting  remarks  concerning  the  use  of 
parameterlzatlons  of  R  and  R-1  to  differen¬ 
tiate  between  data  sets  of  AR  and  MA  type. 

INTRODUCTION 

Modern  spectrum  estimation  is  primarily 
concerned  with  the  identification  of  parametric 
models  that  represent  an  underlying  random  pro¬ 
cess.  In  most  techniques,  the  concept  of  wide 
sense  statlonarlty  seems  to  underly  the  very  no¬ 
tion  of  a  spectrum.  A  stationary  time  series  is 
modeled  as  the  output  of  a  time  invariant  linear 
system  driven  by  a  stationary  white  noise  process. 
The  difficulty  with  stationary  models  is  that  in¬ 
itial  conditions  manifest  themselves  as  nuisance 
parameters.  The  problem  of  the  proper  initial¬ 
ization  can  be  aolved  by  identifying  non  atation- 
ary  models  where  the  inltal  conditions  are  absorbed 
naturally  into  the  theory. 

Pursuing  ideas  developed  in  [l] ,  we  present 
a  general  framework  for  deriving  and  interpreting 
analysis  and  synthesis  spectra  of  the  autoregres¬ 
sive  (AR)  and  moving  average  (MA)  type.  Investi¬ 
gation  of  AR  linear  transformations  of  finite 
dimensional  data  records  yields  a  aet  of  inter- 
•ediate  techniques  (parameterized  likelihood) 

[l] .  [2] ,  associated  with  the  maximum  likelihood 
method  (MLM)  [3]  and  maximum  entropy  method  (MEM) 

1*1 »  [5],  [6],  of  spectrum  analysis.  Investigation 
of  MA  linear  transformations  yields  intermediate 
MA  techniques  (parameterized  Bartlett)  associated 
with  an  approximation  of  the  correlation  matrix  and 
and  the  conventional  Bartlett  (BA)  spectrum  [7]  . 


Simulations  on  synthetic  date  sets  lead  to 
interesting  conclusions:  the  parameterlzatlons 
of  the  correlation  matrix  R  and  its  inverse  R-1 
seem  to  be  useful  for  deciding  whether  a  data  set 
is  of  the  AR  or  MA  type.  In  the  paper  we  use  term¬ 
inology  like  stationary  time  series,  snapshot, 
filter,  and  linear  transformation.  This  terminol¬ 
ogy  is  summarized  in  Fig.  1.  The  figure  is  re¬ 
ferred  to  throughout  the  text. 

Fig.  1.  u«  HIT  1  ua  nu  «h  oi  itaw  om 


AR  REPRESENTATIONS  OF  A  STATIONARY  TIME  SERIES 

In  this  section  we  briefly  recall  the  results 
developed  in  [l]  and  provide  an  alternative  inter¬ 
pretation  of  the  maximum  likelihood  method  spectrum. 

Let  (y£)  denote  a  real,  zero-mean  wide  sense 
stationary  sequence  with  real  lj  correlation  se¬ 
quence  (r£). 

Kolmogorov  Representation: 

The  sequence  (y  )  has  the  following  AR  (°°)  re¬ 
presentation 

_?  a  y_  ■  u_  a  +  0 

n«o  n  't-n  t  o' 

where  ut  is  a  white  noise  sequence  with  zero  mean 
and  unit  variance.  The  coefficients  (ai)  are  the 
AR  (“>)  filter  coefficients. 


The  first  order  descriptor  Is  resdily  obtained 
by  replacing  the  input  sequence  (ut)  by  (£t) , 
(6£-o  t  +o,  £„•!): 


s  h 

n  t-n 


for  all  t 


The  correlation  sequence ,  a  second  order  descrip¬ 
tor!  is  characterized  by 


J  »  t 

m*o  m  n*( 


r  a  ■  6 

'o  m-n+s  n  s 


The  specialization  to  the  AR(p)  case  is  straight¬ 
forward  . 


AR  or  Analysis  Transformations 

We  consider  now  a  finite  length  data  record 
Tt-(yo. . . .ytl)  of  the  process  (y  ) .  It  has 
a  symnetric  and  Toeplitz  correlation  matrix 


The  Inverse  correlation  aatrix  R~Hvas  a  UL 
Cholesky  decomposition  (Fig.  1): 


Defining  the  white  vector  U  as  E(U  )«o  and 
E(UtU  )  ■  I  (ixt  identity),  the  snapshot  Y 
has  the  AR  type  representation  (Fig.l).  ’’ 


Vt'ut 

The  row  vectors  a*  of  A  are  the  order-s  HAwhite- 
ners  or  analyzers  and  the  column  vectors  are 
Interpreted  as  impulsive  exciters  [l].  ± 8  is  also 
known  as  the  impulse  response  for  the  AR  filter 
that  corresponds  to  the  order-s  HA  synthesizer 
h®. 

The  first  order  descriptor  is  obtained  by  sub¬ 
stituting  1£  for  Ut  to  obtain 

AtHt  ■  lt 

where  H(  is  the  matrix 


As  AY  «  U  describes  a  whitening  operation  on 
Yt,  replacing  Y{  by  Ct(u)  yields 


At(u)  -  AtCt(a)) 
C*(w)  -  (  1  e+3“ 


eiu(t-l)) 


This  is  a  column  vector  of  phased  complex  fre¬ 
quency  responses  for  the  MA  vhlteners  a5: 

At(w)  -  (  a° (uj) . at_1(u))) 

a®(w)  -  a®  C£(w) 

The  spectrum  associated  with  Y  is  the  inverse  of 
the  spectrum  associated  with  the  whitening  trans¬ 
formation  A£  (Norm  of  the  complex  frequency  re¬ 
sponse)  : 

Rt  (“}  '  |At(u) | 7  “  C«(w)R-iCt (u) 

.  —1 _ 

CI  |  a*(u)|2 

8«0 

This  spectrum  has  the  following  interore tat  ions : 

-Rt(w)  is  the  maximum  likelihood  method  (MLM) 
spectrum  [3]  . 

-  The  inverse  of  Rt(w)  is  an  average  of  order 
increasing  whitening  spectra  [s] . 

-  If  only  the  (t-1)  term  is  used  in  the  sum¬ 
mation  above  (corresponding  to  a  very  specific 
weighting  pattern),  the  maximum  entropy  method 
(MEM)  spectrum  is  produced  [4]. 

-  Using  the  second  interpretation  of  the  column 
vector  ag  ,  the  MLM  spectrum  is  also  the  Fourier 
transform  of  the  diagonal  sums  of  a  correlation 
sequence.  This  correlation  is  built  on  the  impulse 
response  sequence  for  the  AR  filters  that  corre¬ 
spond  to  the  order-s  MA  synthesizers  (o<_s<_t-l). 


Parameterized  AR  Transformation 

If  the  process  (y  )  is  AR  (p) ,  then  in  the 
Levinson/Durbin  algorithm,  the  reflection  coeffi¬ 
cients  converge  to  0  in  exactly  p  steps.  Hence 
for  t>p,  the  a*  vectors  repeat  themselves  and 
the  AR  transformation  specializes  as  follows: 

r 


The  results  for  AR(p)  linear  transformations  sug¬ 
gest  that  the  AR  transformation  A  may  be  approxi¬ 
mated  by  the  pth  order  approxlmant  At„  [ l] ,  [2]. 
The  associated  spectrum  is  a  parameterized  maxi¬ 
mum  likelihood  spectrum.  Its  properties  have  been 
explored  in  [l],  [2],  [9]. 


MA  REPRESENTATIONS  OF  A  STATIONARY  TIME  SERIES 

The  same  idea*  translate  directly  to  the  MA 
case  where  the  LU  Cholesky  deconposition  of  the 
correlation  matrix  R  plays  a  key  role  (Flg.l). 

This  yields  an  Intermediate  spectrum  called  the 
parameterized  Bartlett  spectrum  (pBA) . 

Wold  Decomposition 

The  Wold  decomposition  for  the  sequence  (yt> 

Is  the  Infinite  moving  average  (MA(“) ) : 

y*  ?  h  u 
1  t  n*o  n  t-n 

The  coefficients  (h  )  are  the  MA(»)  filter  coeffi¬ 
cients.  They  are  aSso  the  impulse  response  se¬ 
quence  since  substituting  (6£)  for  (u£)  yields: 

ht  »  h£  t  >  o 

h£  ■  0  ow 

The  correlation  sequence  (second  order  descriptor) 
is  then  written  in  terms  of  the  first  order  des¬ 
criptor: 

rt  '  X  hnhn+|t|  ^  all  t 

MA  or  Synthesis  Transformation 

As  in  the  AR  case,  we  consider  only  the  snap¬ 
shot  Y  .  The  corresponding  R  ,  symmetric  and 
Toepllt2  correlation  matrix  has  the  LI!  Cholesky 
decomposition 

Rt  -  HtH« 

where  H£  is  the  lower  triangular  taatrlx  introduced 
in  the  second  section.  The  sth  column  vector  hs 
is  the  order-s  MA  synthesizer  and  the  sth  column 
vector  hgis  the  impulse  response  of  an  MA  linear 
transformation  to  an  impulse  applied  at  time  s. 
hg  is  also  the  impulse  response  for  the  AR  filter 
corresponding  to  the  order-s  MA  whitener. 

The  finite  length  data  record  Y(  has  the  MA 
type  representation 

*t  ■  Vt 

and  the  first  order  descriptor  is  obtained  by  re¬ 
placing  Ut  by  It:  H£  -  Ht.  The  frequency  response 

of  the  MA  transformations  is 

Ht  (u)  -  c"(u)Ht 

This  is  a  row  vector  of  phased  complex  frequency 
responses  for  the  impulse  responses  (h  (u),.... 
ht_i(u>).  As  in  the  AR  linear  transformation  case, 
we  define  a  spectrum 


R;(<o)  -  |Ht(w)|*-C*(ui)RtCt(u))-  V  *«#(u>)|* 

The  spectrum  R'(u)  has  then  the  following  inter¬ 
pretations  : 

i  i_ i  , _ 


-  f  r;(uo 


r  (i_i£i w  e*10" 
n--(t-l)U  t  ,rn 


-  This  is  the  conventional  or  Bartlett  spectrum 

Oj- 

-  It  is  also  an  average  of  magnitude  squared 


frequency  responses. 


This  is  the  reason  for  the  extreme  smoothness  of 
the  conventional  spectrum. 

-  It  is  also  interpreted  in  terms  of  the  impulse 
responses  for  the  AR  filters  corresponding  to  the 
order-s  MA  whiteners  (o  <_s  <_t-l) .  It  is  the 
Fourier  transform  of  the  diagonal  sums  of  the  cor¬ 
relations  built  on  these  impulse  responses. 


Parameterized  MA  Transformations 


As  in  the  AR  case,  the  parameterization  is 
suggested  by  the  particular  structure  of  the  ma¬ 
trix  Ht  in  the  MA(q)  case.  Suppose  the  original 
process  (yt)  is  MA(q).  By  running  the  impulse  re¬ 
sponse  algorithm  we  can  generate  the  column  vectors 
of  Rt.  The  algorithm  in  that  case  produces  time 
varying  impulse  responses  of  maximum  length  q. 
Hence  the  MA  transformation  specializes  as  follows: 


The  order-s  MA  synthesizers  hs  stop  their  growth 
at  S“p  but  remain  time  varying  .  h®  converges  to 
the  true  stationary  MA(q)  synthesizer  only  for 
s  "*  °°.  This  is  a  ma)or  difference  with  the  spe¬ 
cialization  of  the  AR  transformation  for  the  AR(p) 
case. 


The  parameterization  suggested  corresponds  to 
applying  a  rectangular  window  on  the  correlation 
sequence  and  then  using  the  Impulse  response  al¬ 
gorithm.  This  procedure  can  lead  to  negative 
spectra  that  are  meaningless,  even  though  the 
Toeplitz  nature  of  the  approximant  is  preserved. 

We  propose  the  following  approximation: 


This  approximation  of  H£  does  not  preserve  the 
Toeplitz  property  but  insures  against  negative 
spectra.  As  in  the  AR  case,  it  corresponds  to 
applying  more  weight  to  the  q-order  MA  synthesizer. 
The  spectrum  associated  with  H(  is  a  parameter¬ 
ized  Bartlett  (BA)  spectrum: 


NUMERICAL  RESULTS 
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The  Influence  of  parameterization  on  Che 
MLM  and  BA  spectra  la  Investigated  for  three 
different  sets  of  synthetic  data.  For  each 
set  of  data  the  correlation  sequence  is  esti¬ 
mated  using  the  unbiased  estimator  on  4096 
points.  The  different  spectra  are  calculated 
using  21  lags  of  the  correlation  sequence. 

The  exact  spec tr us  and  the  MEM  spectrum  are 
plotted  as  references. 

The  first  act  of  data  considered  is  the 
AR(4)  process  used  in  [l] .  It  has  tvo  real 
poles  at  +  .94  and  a  pair  of  complex  conjugate 
poles  at  +  J  .94.  Fig.  2  shows  the  classical 
spectra  corresponding  to  a  covariance  matrix  of 
order  21.  Fig.  3  ahows  what  happens  when  the 
MLM  and  BA  spectra  are  parameterized  at  (4,21). 

It  is  readily  noted  that  the  parameterization  of 
the  BA  spectrum  changes  a  great  deal,  whereas  the 
MLM  spectrum  appears  almost  unaffected. 

The  second  set  of  data  is  an  MA(4)  with  two 
real  zeros  at  +  .94  and  a  pair  of  complex  conju¬ 
gate  zeros  at  +  j  .94.  Figs.  4-5  show  the  dif¬ 
ferent  spectra  for  the  same  parameterization  as 
before.  The  MEM  spectrum  of  order  21  has  spurious 
peaks.  Fig.  5  ahows  the  dramatic  changes  obtained 
by  parameterization  of  the  MLM  spectrum.  Little 
effect  is  seen  in  the  parameterized  BA  spectrum. 

Finally  the  same  experiment  is  carried  out 
for  a  set  of  ARMA  (2,1)  data  with  a  pair  of  complex 
conjugate  poles  at  .66+J.49  and  a  real  zero  at  .33. 
Here  again  we  note  the  effect  of  parameterization. 
Both  the  BA  and  the  MLM  are  affected,  but  the  effect 
on  the  BA  spectrum  seems  qualitatively  much  more 
Important.  For  the  MLM,  the  effects  are  evident 
only  the  low  frequencies. 

We  can  suamarlze  the  results  in  the  following 
way.  The  selection  of  the  right  technique  in 
parametric  spectrum  analysis  depends  a  great  deal 
on  the  type  of  data  being  analyzed.  An  MA  type 
technique  will  be  much  more  suited  to  the  analysis 
of  data  of  the  MA  type  than  an  AR  technique.  This 
is  obvious.  Furthermore,  once  the  right  technique 
has  been  selected  it  is  seen  that  parameterization 
has  little  effect  on  the  resulting  spectrum.  These 
results  aeem  to  provide  a  method  to  differentiate 
between  data  sets  of  the  MA,  AR  or  ARMA  type.  One 
has  to  compare  the  parameterized  MLM  and  Bartlett 
spectra  to  the  non-parameterlzed  ones  to  decide 
which  one  has  been  affected  most. 

CONCLUSIONS 

We  have  presented  a  general  framvork  for  de¬ 
riving  and  interpreting  analysis  and  synthesis 
spectra  of  the  AR  and  MA  type.  Along  the  lines  of 

[l]and[2]ve  have  introduced  a  parameterized  Bartlett 
spectrum.  Here  we  haven't  been  concerned  with  the 
very  important  problem  of  order  fitting.  An  order 
fitting  rule  (J-Dlvergence)  was  proposed  in  [l]. 
Discussions  of  MA(q)  and  AR(p)  linear  transfor¬ 
mation  generalize  to  ARMA(p,q)  transformations. 

Such  transformations  underly  lattice  and  inno¬ 
vations  representations  of  stationary  time  scries 
and  are  the  object  of  future  work. 
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